Sunday, February 23, 2025
WebSocket

Updated: April 2025

π Introduction
WebSocket is a protocol that facilitates full-duplex communication channels over a single TCP connection, enabling real-time data exchange between clients and servers.
βHow it Works
- The WebSocket connection begins with an opening handshake, which is compatible with HTTP, allowing it to traverse firewalls and proxies that might block non-HTTP traffic.
- This handshake involves the client sending an HTTP request with specific headers indicating the desire to upgrade the connection to WebSocket.
Client Request:
In this request:
- Upgrade: websocket and Connection: Upgrade headers signal the intent to switch protocols.
- Sec-WebSocket-Key is a base64-encoded random nonce used for security.
- Sec-WebSocket-Protocol specifies subprotocols the client wishes to use.
- Sec-WebSocket-Version indicates the WebSocket protocol version.
The server responds with a status code 101 Switching Protocols, confirming the protocol upgrade:
Here, Sec-WebSocket-Accept is a hash derived from the clientβs Sec-WebSocket-Key, ensuring the legitimacy of the request.
πΌοΈ WebSocket Frame
Once the connection is established, data is transmitted in frames, which are the smallest unit of communication in WebSocket.
Letβs delve into the intricacies of reading and interpreting WebSocket frames by examining the provided Go code snippet. This exploration will elucidate the purpose behind each operation and how they collectively facilitate accurate frame parsing.
π¦ Unpacking the WebSocket Frame Structure
A WebSocket frame is composed of several fields, each serving a specific function:
- FIN Bit (1 bit): Indicates if the frame is the final fragment in a message. A value of 1 signifies that this is the last frame.
- RSV1, RSV2, RSV3 (3 bits): Reserved bits for future extensions; typically set to 0.
- Opcode (4 bits): Determines the nature of the frameβs payload:
- 0x0: Continuation frame
- 0x1: Text frame
- 0x2: Binary frame
- 0x8: Connection close frame
- 0x9: Ping frame
- 0xA: Pong frame
- Mask Bit (1 bit): Indicates if the payload data is masked.
- Payload Length (7 bits, 7+16 bits, or 7+64 bits): Specifies the length of the payload data.
- Masking Key (0 or 32 bits): A 4-byte key used to unmask the payload data.
- Payload Data (variable length): The actual application data being transmitted.
πΏ Parsing the Frame: A Step-by-Step Breakdown
The ReadFrame function in the provided Go code is designed to read and interpret a single WebSocket frame from a TCP connection. Hereβs a detailed walkthrough of its operations:
1. Reading the First Byte:
The function begins by reading the first byte, which contains the FIN bit (1 bit) and Opcode (4 bits).
2. Extracting the FIN Bit and Opcode:
- FIN Bit: The operation (firstByte[0] & 0x80) != 0 checks if the most significant bit (MSB) is set. If it is, frame.Fin is true, indicating this is the final fragment.
- Opcode: The operation firstByte[0] & 0x0F isolates the last four bits, determining the frameβs type.
Explaination
-
firstByte[0] & 0x80 β Masks everything except the first bit (FIN bit).
-
0x80 in binary: 1000 0000
-
Example: If firstByte[0] is 1000 0010 (Opcode = 2, FIN = 1)
- 1000 0010 & 1000 0000 = 1000 0000
- != 0 β true (FIN bit is set)
-
-
firstByte[0] & 0x0F β Extracts the lower 4 bits (Opcode).
-
0x0F in binary: 0000 1111
-
Example: 1000 0010 & 0000 1111 = 0000 0010 (Opcode = 2)
-
3. Reading the Second Byte:
The second byte contains the Mask bit (1 bit) and the initial Payload Length (7 bits).
4. Extracting the Mask Bit and Payload Length:
- Mask Bit: Similar to the FIN bit extraction, (secondByte[0] & 0x80) != 0 checks if the MSB is set. If true, the payload is masked.
- Payload Length: secondByte[0] & 0x7F retrieves the lower 7 bits, providing the payload length.
Explaination
-
secondByte[0] & 0x80 β Extracts the mask bit.
- 0x80 in binary: 1000 0000
- If secondByte[0] = 1000 0011, then Masked is true.
-
secondByte[0] & 0x7F β Extracts the 7-bit payload length.
- 0x7F in binary: 0111 1111
- Example: 1000 0011 & 0111 1111 = 0000 0011 (Payload length = 3)
5. Handling Extended Payload Lengths:
Depending on the value of payloadLen, additional bytes may be read to determine the actual payload length:
- 126: Indicates that the payload length is extended to the next 2 bytes.
- 127: Indicates that the payload length is extended to the next 8 bytes.
- β€ 125: The payload length is as stated.
This approach ensures that frames with larger payloads are accurately processed.
6: Reading the Masking Key
If the mask bit is set, the next 4 bytes represent the masking key.
- Reads 4 bytes for the masking key, which will be used to unmask the payload.
7: Reading and Unmasking the Payload
- Reads the actual data being transmitted.
- If the message is masked, we must unmask it using the masking key.
Explaination
- Each byte in the payload is XORed with a byte from the masking key.
- The masking key is only 4 bytes long, so we cycle through it:
- i % 4 ensures we use the correct key byte in a loop.
π§© WebSocket using raw TCP
- For the complete WebSocket server implementation using raw TCP, check out the full code here.
π Building a Scalable WebSocket Architecture
When designing a WebSocket-based system, scalability isnβt just a buzzwordβitβs a necessity. A poorly structured WebSocket server can quickly become a bottleneck, struggling under heavy load and leading to performance degradation. Thatβs why itβs crucial to architect the system in a way that efficiently handles multiple connections while keeping the codebase clean and maintainable.
Breaking Down the Architecture
The WebSocket server can be thought of as the backbone of real-time communication. But instead of cramming everything into one giant module, we break it down into distinct components that each serve a specific role:
-
π₯οΈ WS Server The WebSocket server is responsible for managing the TCP listener, accepting connections, and handling event polling. This ensures that incoming messages are efficiently processed without blocking other operations.
-
π οΈ Handlers Handlers contain the core business logic, message processing, and state management. Keeping this separate from the WebSocket server means you can scale your business logic independently without affecting the underlying connection management.
-
π Connections Each WebSocket connection follows the protocol, reads/writes frames, and maintains a persistent session. This abstraction ensures that connections are efficiently managed, making it easier to implement features like auto-reconnect, heartbeats, and load balancing.
Adding Abstraction for Scalability
To take it a step further, we introduce a layer of abstraction that keeps the WebSocket server loosely coupled with specific implementations. This includes:
-
β¨ WebSocket Handler Interface Defines lifecycle methods like
OnConnect()
,OnMessage()
, andOnClose()
, allowing different modules (such as a chat system or a live dashboard) to plug in their own logic without modifying the core WebSocket server. -
π£οΈ Chat Handler (or any domain-specific handler) Manages user state, processes messages, and broadcasts updates. This modularity ensures that multiple real-time features can be developed and scaled separately.
-
π Connection Management With a dedicated connection handler, we maintain WebSocket pointers, handle pings/pongs, and ensure a smooth user experience.
Why This Architecture Scales Well?
-
Separation of Concerns β By decoupling the WebSocket server from business logic, we make it easier to maintain, test, and extend features independently.
-
Improved Performance β The architecture ensures that incoming messages are handled efficiently without blocking the main thread.
-
Domain-Specific Scalability β Whether itβs a chat system, notifications, or live updates, each domain can be optimized separately.
-
Easier Load Balancing β Since the WebSocket connections and handlers are modular, they can be distributed across multiple servers using techniques like sticky sessions or WebSocket brokers (e.g., Redis, NATS).
By structuring WebSocket services this way, we ensure they can handle thousands (or even millions) of connections without breaking a sweat. π Curious to see the implementation in action? Check it out here.
π Scaling with Goroutines & Channels
Scaling WebSockets isnβt as straightforward as scaling traditional HTTP applications because WebSockets maintain persistent connections. That means we canβt just spin up more stateless instances to handle additional loadβwe need a smarter approach. π§
β‘ The Vertical Scaling Challenge
- First, letβs talk about vertical scaling. The goal here is simple: keep CPU and memory utilization as low as possible while efficiently handling millions of WebSocket connections. πΎ
- At first glance, it might seem logical to dedicate a separate goroutine to each WebSocket connection. After all, Goβs goroutines are lightweight, and this would allow us to utilize all CPU cores effectively. ποΈ
- But thereβs a catchβeach goroutine requires its own memory stack, typically 2 to 8 KB. If we assume an average of 4 KB per goroutine, handling 3 million WebSocket connections would require 12 GB of memoryβand thatβs just for goroutines! π₯ Add in buffers for reading and writing messages, and the memory footprint grows significantly. π
And we havenβt even built the actual application yet! π
π οΈ Optimizing Goroutines & Buffers
-
So how do we reduce this memory footprint? Instead of creating a dedicated goroutine for every connection, we can use a worker pool. π‘ Rather than each WebSocket connection spinning up its own read/write goroutines, we assign connections to a pool of worker goroutines that handle incoming and outgoing messages efficiently.
-
We can further optimize memory usage by reusing buffers instead of allocating new ones for every message. π By leveraging polling mechanisms, such as epoll on Linux, we can wake up worker goroutines only when thereβs actual data to process. ποΈββοΈ This prevents thousands of goroutines from idling and consuming unnecessary memory. π§΅
π― The Takeaway
Scaling WebSockets is tricky because of persistent connections and the need for efficient resource management. A naΓ―ve approachβone goroutine per connectionβleads to huge memory consumption. Instead, we can scale efficiently by:
- β Using worker pools instead of dedicated goroutines per connection.
- β Reusing buffers to minimize unnecessary allocations.
- β Leveraging polling mechanisms to process connections only when needed.
This approach ensures that we maximize resource utilization while keeping our WebSocket server scalable and performant. π Want a deep dive into implementation? Check out below reference for more details.