logicentity.com

WebSocket Protocol Fundamentals

Pradeep DavuluriFebruary 27, 202612 min read

Why HTTP Falls Short for Real-Time

HTTP is a request-response protocol. The client sends a request; the server sends a response; the connection is logically complete. For real-time data — chat messages, live sports scores, collaborative editing, multiplayer game state — this model breaks down because the server needs to push data to the client without being asked. HTTP has no mechanism for server-initiated messages.

Before WebSocket, developers worked around this limitation with three hacks. Short polling: the client sends a new HTTP request every N seconds to check for updates. Simple but wasteful — most responses are empty, and the polling interval creates a latency floor. Long polling: the client sends a request, and the server holds it open until there's data to send (or a timeout). Better latency, but each response requires a new request, and the overhead of HTTP headers on every exchange adds up. Server-Sent Events (SSE): a long-lived HTTP response where the server streams events. Clean for server-to-client push, but unidirectional — the client can't send messages back over the same connection.

WebSocket solves this by establishing a persistent, full-duplex communication channel over a single TCP connection. Both client and server can send messages at any time, independently, with minimal framing overhead (2–14 bytes per message vs. hundreds of bytes for HTTP headers). The connection stays open until explicitly closed, eliminating the overhead of repeated handshakes.

The Upgrade Handshake

A WebSocket connection begins as a regular HTTP/1.1 request. The client sends a GET request with special headers requesting an "upgrade" from HTTP to the WebSocket protocol. The server, if it supports WebSocket at the requested path, responds with 101 Switching Protocols. After this handshake, the TCP connection is repurposed: HTTP framing stops, and WebSocket framing begins. No new connection is needed — the same TCP socket is reused.

--- Client Request ---
GET /chat HTTP/1.1
Host: example.com
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==
Sec-WebSocket-Version: 13
Sec-WebSocket-Protocol: chat, json
Origin: https://example.com

--- Server Response ---
HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo=
Sec-WebSocket-Protocol: json

The Sec-WebSocket-Key is a random 16-byte value, base64-encoded, generated by the client. The server concatenates it with a fixed GUID (258EAFA5-E914-47DA-95CA-C5AB0DC85B11), computes the SHA-1 hash, and returns the base64-encoded result as Sec-WebSocket-Accept. This proves the server understood the WebSocket protocol — it's not a security mechanism (no encryption, no authentication), it's a protocol validation step that prevents misconfigured HTTP servers from accidentally accepting WebSocket connections.

The Sec-WebSocket-Protocol header implements subprotocol negotiation. The client proposes one or more application-level protocols (e.g., graphql-ws, mqtt, json); the server selects one. This allows both sides to agree on the message format before the first data frame is sent.

The HTTP/1.1 requirement The WebSocket upgrade handshake is an HTTP/1.1 mechanism — it uses the Upgrade header, which is specific to HTTP/1.1. It does not work over HTTP/2. RFC 8441 defines an extension for bootstrapping WebSocket over HTTP/2 using the CONNECT method with an :protocol pseudo-header, but browser support is limited. In practice, most WebSocket connections use HTTP/1.1 for the handshake and then operate over a dedicated TCP connection, separate from the HTTP/2 connection used for regular requests.

Frame Anatomy: Opcodes, Masking, and Payloads

After the handshake, all data flows as WebSocket frames. Each frame has a compact binary header (minimum 2 bytes) followed by the payload. Understanding the frame format is essential for debugging, performance tuning, and implementing custom WebSocket servers.

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-------+-+-------------+-------------------------------+
|F|R|R|R| opcode|M| Payload len |    Extended payload length    |
|I|S|S|S|  (4)  |A|     (7)     |           (16/64)             |
|N|V|V|V|       |S|             |   (if payload len == 126/127) |
| |1|2|3|       |K|             |                               |
+-+-+-+-+-------+-+-------------+-------------------------------+
|                  Masking key (if MASK set)                    |
+---------------------------------------------------------------+
|                       Payload Data                            |
+---------------------------------------------------------------+

FIN bit: Indicates whether this frame is the final fragment of a message. Large messages can be split across multiple frames (fragmentation), with FIN=0 on continuation frames and FIN=1 on the last frame. This enables streaming large messages without buffering the entire payload in memory.

Opcode (4 bits): Identifies the frame type. 0x1 is a text frame (UTF-8 encoded). 0x2 is a binary frame. 0x8 is a connection close. 0x9 is a ping. 0xA is a pong. 0x0 is a continuation frame (for fragmented messages).

MASK bit and masking key: All frames sent from client to server must be masked with a 32-bit key. Masking is a security measure against cache poisoning attacks on intermediary proxies — it ensures WebSocket frame data can't be confused with HTTP responses by transparent proxies. Server-to-client frames are never masked. The masking algorithm is a simple XOR: each byte of the payload is XORed with mask[i % 4].

OpcodeHexTypePayload
Text0x1DataUTF-8 encoded string
Binary0x2DataArbitrary bytes
Close0x8ControlOptional: 2-byte status code + reason
Ping0x9ControlOptional application data (≤125 bytes)
Pong0xAControlMust echo the ping's payload
Continuation0x0DataContinues a fragmented message
Payload size encoding If the payload is 0–125 bytes, the length fits in the 7-bit field. If 126–65535 bytes, the field contains 126 and the next 2 bytes hold the actual length (16-bit unsigned). If larger, the field contains 127 and the next 8 bytes hold the length (64-bit unsigned). This variable-length encoding means tiny messages (the common case for real-time apps) have only 2 bytes of framing overhead — client frames add 4 bytes for the mask, totaling 6 bytes.

The Browser API

The browser's WebSocket API is deliberately simple — four events and two methods cover the entire interface. The complexity of framing, masking, fragmentation, and ping/pong is handled by the browser engine. Application code works with high-level messages, not raw frames.

const ws = new WebSocket('wss://api.example.com/chat', ['json']);

// Connection opened
ws.addEventListener('open', () => {
  console.log('Connected, subprotocol:', ws.protocol);
  ws.send(JSON.stringify({ type: 'join', room: 'general' }));
});

// Message received (text or binary)
ws.addEventListener('message', (event) => {
  if (typeof event.data === 'string') {
    const msg = JSON.parse(event.data);
    handleMessage(msg);
  } else {
    // Binary data: Blob (default) or ArrayBuffer
    handleBinaryData(event.data);
  }
});

// Connection closed
ws.addEventListener('close', (event) => {
  console.log(`Closed: ${event.code} ${event.reason}`);
  // event.code: 1000 = normal, 1001 = going away, 1006 = abnormal
});

// Error (always followed by close)
ws.addEventListener('error', (event) => {
  console.error('WebSocket error', event);
});

The send() method accepts strings (sent as text frames), ArrayBuffer/ArrayBufferView (sent as binary frames), or Blob (sent as binary frames). Set ws.binaryType = 'arraybuffer' to receive binary messages as ArrayBuffer instead of Blob — this avoids the async read step that Blob requires and is essential for low-latency binary protocols.

The bufferedAmount property reports how many bytes of data have been queued via send() but not yet transmitted. Monitor this to implement backpressure — if bufferedAmount exceeds a threshold, stop sending until the buffer drains.

Practical takeaway Always set ws.binaryType = 'arraybuffer' if you're working with binary data. Always check ws.readyState === WebSocket.OPEN before calling send(). Always handle the close event — the connection will close eventually, and your application must handle it gracefully. Never assume the connection is permanent.

Message Framing and Application Protocols

WebSocket provides a reliable message transport, but it doesn't define the message format. The protocol delivers complete messages (handling fragmentation internally), but it's your responsibility to define the structure of those messages — what fields they contain, how to route them, and how to handle errors. This is the application protocol layer.

The most common approach for web applications is a JSON-based message envelope with a type field for routing. This is human-readable, easy to debug, and sufficient for most real-time features. For high-throughput or latency-sensitive use cases, binary formats like Protocol Buffers, MessagePack, or FlatBuffers reduce payload size and parsing overhead.

// JSON message envelope pattern
type ClientMessage =
  | { type: 'join';   room: string }
  | { type: 'leave';  room: string }
  | { type: 'chat';   room: string; text: string }
  | { type: 'ping';   ts: number };

type ServerMessage =
  | { type: 'chat';      room: string; user: string; text: string }
  | { type: 'presence';  room: string; users: string[] }
  | { type: 'error';     code: string; message: string }
  | { type: 'pong';      ts: number };

// Type-safe message dispatcher
function dispatch(raw: string) {
  const msg: ServerMessage = JSON.parse(raw);
  switch (msg.type) {
    case 'chat':      return handleChat(msg);
    case 'presence':  return handlePresence(msg);
    case 'error':     return handleError(msg);
    case 'pong':      return handlePong(msg);
  }
}
Established subprotocols Several application-level WebSocket protocols are widely standardized: graphql-ws for GraphQL subscriptions, mqtt for IoT messaging, stomp for message broker integration, and wamp for RPC + pub/sub. Using a standardized subprotocol gives you library support on both client and server, interoperability between implementations, and a well-tested message lifecycle (connection init, keep-alive, error handling, graceful shutdown).

Heartbeats, Reconnection, and Resilience

WebSocket connections are long-lived, and long-lived connections fail. Mobile users switch networks. Laptops go to sleep. Load balancers time out idle connections (AWS ALB defaults to 60 seconds, nginx to 60 seconds, Cloudflare to 100 seconds). NAT devices and firewalls drop connections that haven't transmitted data recently. Your application must handle connection loss gracefully.

Heartbeats (Ping/Pong)

The WebSocket protocol includes native ping and pong frames. Either side can send a ping; the other must respond with a pong echoing the same payload. Browsers automatically respond to server pings (you can't intercept them in the browser API), but they don't send pings. You need to implement an application-level heartbeat to detect dead connections.

// Application-level heartbeat
class HeartbeatManager {
  #ws;
  #interval;
  #timeout;
  #missed = 0;

  constructor(ws, intervalMs = 30000, maxMissed = 3) {
    this.#ws = ws;
    this.#interval = setInterval(() => {
      if (this.#missed >= maxMissed) {
        ws.close(4000, 'Heartbeat timeout');
        return;
      }
      ws.send(JSON.stringify({ type: 'ping', ts: Date.now() }));
      this.#missed++;
    }, intervalMs);
  }

  receivedPong() {
    this.#missed = 0;
  }

  stop() {
    clearInterval(this.#interval);
  }
}

Reconnection with Exponential Backoff

When a connection drops, the client should reconnect automatically. Naive reconnection (retry immediately, every time) causes thundering herd problems when a server restarts — thousands of clients reconnect simultaneously, overwhelming the server. Exponential backoff with jitter spreads out reconnection attempts.

class ReconnectingWebSocket {
  #url;
  #ws;
  #attempt = 0;
  #maxDelay = 30000;

  constructor(url) {
    this.#url = url;
    this.#connect();
  }

  #connect() {
    this.#ws = new WebSocket(this.#url);

    this.#ws.onopen = () => {
      this.#attempt = 0; // reset on successful connection
    };

    this.#ws.onclose = (event) => {
      if (event.code === 1000) return; // intentional close

      // Exponential backoff with jitter
      const base = Math.min(1000 * 2 ** this.#attempt, this.#maxDelay);
      const jitter = base * (0.5 + Math.random() * 0.5);

      setTimeout(() => this.#connect(), jitter);
      this.#attempt++;
    };
  }
}
The thundering herd If your server restarts and 10,000 clients all reconnect at the same instant (attempt 0, delay 0), the server is immediately overwhelmed. The jitter in the backoff formula is critical — it randomizes the reconnection time within each backoff window, spreading the load. Without jitter, exponential backoff just synchronizes all clients at the same delayed timestamp.

Scaling WebSockets: Horizontal Architecture

A single Node.js process can handle 100,000+ concurrent WebSocket connections (each connection is ~10KB of memory). The bottleneck isn't connections — it's message routing. When user A in room "general" sends a message, it must be delivered to all other users in "general" — who might be connected to different server instances behind a load balancer.

The standard solution is a pub/sub backbone. Each server instance subscribes to a shared message bus (Redis Pub/Sub, NATS, Kafka). When a message arrives on server 1, it publishes to the bus. Server 2 and server 3 receive the message from the bus and forward it to their local connections that are subscribed to the relevant room.

// Horizontal WebSocket architecture with Redis pub/sub
import { WebSocketServer } from 'ws';
import { createClient } from 'redis';

const wss = new WebSocketServer({ port: 8080 });
const pubClient = createClient();
const subClient = pubClient.duplicate();

// Local room membership: room → Set of WebSocket connections
const rooms = new Map<string, Set<WebSocket>>();

// When a message is published from ANY server instance,
// deliver it to local connections in the target room
subClient.pSubscribe('room:*', (message, channel) => {
  const roomId = channel.replace('room:', '');
  const members = rooms.get(roomId);
  if (!members) return;

  for (const ws of members) {
    if (ws.readyState === ws.OPEN) {
      ws.send(message);
    }
  }
});

// When a local client sends a message, publish to Redis
wss.on('connection', (ws) => {
  ws.on('message', (raw) => {
    const msg = JSON.parse(raw);
    if (msg.type === 'chat') {
      pubClient.publish(`room:${msg.room}`, raw);
    }
  });
});

Sticky Sessions vs. Stateless Routing

WebSocket connections are stateful — a client connects to a specific server instance and stays connected. Load balancers must route WebSocket upgrade requests to a server and then pin all subsequent frames to the same backend. This is sticky sessions (also called session affinity). Most cloud load balancers support this via cookies or connection-based routing.

The alternative is to make the server layer entirely stateless — each server instance subscribes to the pub/sub backbone, and connection state (room membership, user metadata) is stored in Redis rather than in local memory. Any server can handle any connection. This simplifies scaling (add/remove instances freely) but adds latency for every state lookup.

WebSocket vs. SSE vs. Long Polling vs. WebTransport

WebSocket is not the only real-time transport available. Choosing the right one depends on your directionality requirements, infrastructure constraints, and performance needs.

TransportDirectionProtocolReconnectBest For
WebSocketBidirectionalTCP (dedicated)ManualChat, gaming, collaboration
Server-Sent EventsServer → ClientHTTP (long-lived)AutomaticNotifications, feeds, dashboards
Long PollingPseudo-bidirectionalHTTP (repeated)InherentLegacy compatibility
WebTransportBidirectional + streamsQUIC/HTTP3ManualLow-latency gaming, streaming

SSE is the right choice when the server pushes data to the client but the client doesn't send real-time data back (or sends it via standard HTTP requests). SSE is simpler than WebSocket — it's just HTTP, works through all proxies, supports automatic reconnection with Last-Event-ID, and multiplexes over HTTP/2. For dashboards, news feeds, notification streams, and live score updates, SSE is the better engineering choice over WebSocket.

WebTransport (Chrome 97+, in development elsewhere) is the next generation. Built on HTTP/3 and QUIC, it provides bidirectional communication with both reliable streams (like WebSocket) and unreliable datagrams (like UDP). The unreliable channel is crucial for latency-sensitive applications like real-time gaming and video — where a late packet is worse than a lost packet. WebTransport also supports multiple concurrent streams over a single connection, unlike WebSocket's single-stream model.

Decision framework Need server push only? Use SSE — simpler, more resilient, proxy-friendly. Need bidirectional real-time? Use WebSocket — universal support, mature ecosystem, well-understood. Need ultra-low-latency with unreliable delivery? Use WebTransport — when browser support is sufficient for your audience. Need maximum compatibility? Use long polling as a fallback when WebSocket and SSE are blocked (rare today, but happens in some corporate environments). Libraries like Socket.IO handle this transport negotiation automatically.