Sunday, February 23, 2025

WebSocket

Profile Pic of Akash AmanAkash Aman

Updated: April 2025

Table of Contents

πŸ“ Introduction

WebSocket is a protocol that facilitates full-duplex communication channels over a single TCP connection, enabling real-time data exchange between clients and servers.

Unlike traditional HTTP communication, where each request-response cycle is initiated by the client, WebSocket allows both client and server to send data independently once the connection is established.

❓How it Works

  • The WebSocket connection begins with an opening handshake, which is compatible with HTTP, allowing it to traverse firewalls and proxies that might block non-HTTP traffic.
  • This handshake involves the client sending an HTTP request with specific headers indicating the desire to upgrade the connection to WebSocket.

ClientServerconnection:Upgradeupgrade:websocketsec-websocket-key:/QaoPqM6p+qOmhleyv2P/g==connection:Upgradeupgrade:websocketsec-websocket-accept:1zQBdxBjy9cykwSYQKd10wkvzVA=Websocket ConnectionEstablished


Client Request:

http
GET /chat HTTP/1.1 Host: server.example.com Upgrade: websocket Connection: Upgrade Sec-WebSocket-Key: x3JJHMbDL1EzLkh9GBhXDw== Sec-WebSocket-Protocol: chat, superchat Sec-WebSocket-Version: 13 Origin: http://example.com

In this request:

  • Upgrade: websocket and Connection: Upgrade headers signal the intent to switch protocols.
  • Sec-WebSocket-Key is a base64-encoded random nonce used for security.
  • Sec-WebSocket-Protocol specifies subprotocols the client wishes to use.
  • Sec-WebSocket-Version indicates the WebSocket protocol version.

The server responds with a status code 101 Switching Protocols, confirming the protocol upgrade:


http
HTTP/1.1 101 Switching Protocols Upgrade: websocket Connection: Upgrade Sec-WebSocket-Accept: HSmrc0sMlYUkAGmm5OPpG2HaGWk= Sec-WebSocket-Protocol: chat

Here, Sec-WebSocket-Accept is a hash derived from the client’s Sec-WebSocket-Key, ensuring the legitimacy of the request.


Image

πŸ–ΌοΈ WebSocket Frame

Once the connection is established, data is transmitted in frames, which are the smallest unit of communication in WebSocket.

Each frame encapsulates a portion of the message, accompanied by metadata that dictates how the data should be processed. Understanding the structure and parsing of these frames is crucial for developers aiming to implement or troubleshoot WebSocket communications.

Let’s delve into the intricacies of reading and interpreting WebSocket frames by examining the provided Go code snippet. This exploration will elucidate the purpose behind each operation and how they collectively facilitate accurate frame parsing.

πŸ“¦ Unpacking the WebSocket Frame Structure


FIN0Bits 1234 - 7opcode(4)01 - 7Payloadlength(7)MASK1st Byte2nd ByteExtended Payload length(16/64)(if payload len==126/127) 16 bits or 64 bits2 byte or 8 byte0111 1110 = 0x7E = 1260111 1111 = 0x7F = 127Masking Key32bits4 bytesPayloadFinal PayloadlengthRSV1RSV2RSV31000= 0x80 & 1st byteBinaryHexDecimal0000= 0x0F & 1st byte= 0x80 & 2nd byte= 0x7F & 2nd byte0111 FIN bit to determine if it is the final fragment.Opcode is a 4-bit field in the WebSocket frame header that tells us thepurpose of the frame. 0x00000Continuation frame (part of a fragmented message)0x10001Text frame (UTF-8 encoded text data)0x20010Binary frame (binary data, such as images or files)0x81000Close connection (indicates the connection should be closed)0x91001Ping (used to check if the connection is alive)0xA1010Pong (response to a Ping frame)OpcodeHexBinMASK bit to determine if the payload data is masked.Length of the payload.Websocket Frame0x801280x0F150x7F127111111110000OpcodeFinMaskPayload Len

A WebSocket frame is composed of several fields, each serving a specific function:

  • FIN Bit (1 bit): Indicates if the frame is the final fragment in a message. A value of 1 signifies that this is the last frame.
  • RSV1, RSV2, RSV3 (3 bits): Reserved bits for future extensions; typically set to 0.
  • Opcode (4 bits): Determines the nature of the frame’s payload:
    • 0x0: Continuation frame
    • 0x1: Text frame
    • 0x2: Binary frame
    • 0x8: Connection close frame
    • 0x9: Ping frame
    • 0xA: Pong frame
  • Mask Bit (1 bit): Indicates if the payload data is masked.
  • Payload Length (7 bits, 7+16 bits, or 7+64 bits): Specifies the length of the payload data.
  • Masking Key (0 or 32 bits): A 4-byte key used to unmask the payload data.
  • Payload Data (variable length): The actual application data being transmitted.

Before advancing to the next topic, ensure you have a solid grasp of binary to hexadecimal conversions in both directions. Additionally, comprehend basic bitwise operations: AND, OR, XOR, NOR, NAND between binary and hexadecimal numbers for effective understanding. Review these concepts beforehand.

🌿 Parsing the Frame: A Step-by-Step Breakdown

The ReadFrame function in the provided Go code is designed to read and interpret a single WebSocket frame from a TCP connection. Here’s a detailed walkthrough of its operations:

1. Reading the First Byte:

go
firstByte := make([]byte, 1) if _, err := io.ReadFull(conn, firstByte); err != nil { return nil, err }

The function begins by reading the first byte, which contains the FIN bit (1 bit) and Opcode (4 bits).

2. Extracting the FIN Bit and Opcode:

go
frame.Fin = (firstByte[0] & 0x80) != 0 // 0x80 is 1000 0000 in binary frame.Opcode = firstByte[0] & 0x0F // 0x0F is 0000 1111 in binary
  • FIN Bit: The operation (firstByte[0] & 0x80) != 0 checks if the most significant bit (MSB) is set. If it is, frame.Fin is true, indicating this is the final fragment.
  • Opcode: The operation firstByte[0] & 0x0F isolates the last four bits, determining the frame’s type.

Explaination

  • firstByte[0] & 0x80 β†’ Masks everything except the first bit (FIN bit).

    • 0x80 in binary: 1000 0000

    • Example: If firstByte[0] is 1000 0010 (Opcode = 2, FIN = 1)

      • 1000 0010 & 1000 0000 = 1000 0000
      • != 0 β†’ true (FIN bit is set)
  • firstByte[0] & 0x0F β†’ Extracts the lower 4 bits (Opcode).

    • 0x0F in binary: 0000 1111

    • Example: 1000 0010 & 0000 1111 = 0000 0010 (Opcode = 2)

3. Reading the Second Byte:

go
secondByte := make([]byte, 1) if _, err := io.ReadFull(conn, secondByte); err != nil { return nil, err }

The second byte contains the Mask bit (1 bit) and the initial Payload Length (7 bits).

4. Extracting the Mask Bit and Payload Length:

go
frame.Masked = (secondByte[0] & 0x80) != 0 // 0x80 is 1000 0000 in binary payloadLen := secondByte[0] & 0x7F // 0x7F is 0111 1111 in binary
  • Mask Bit: Similar to the FIN bit extraction, (secondByte[0] & 0x80) != 0 checks if the MSB is set. If true, the payload is masked.
  • Payload Length: secondByte[0] & 0x7F retrieves the lower 7 bits, providing the payload length.

Explaination

  • secondByte[0] & 0x80 β†’ Extracts the mask bit.

    • 0x80 in binary: 1000 0000
    • If secondByte[0] = 1000 0011, then Masked is true.
  • secondByte[0] & 0x7F β†’ Extracts the 7-bit payload length.

    • 0x7F in binary: 0111 1111
    • Example: 1000 0011 & 0111 1111 = 0000 0011 (Payload length = 3)

5. Handling Extended Payload Lengths:

Depending on the value of payloadLen, additional bytes may be read to determine the actual payload length:

  • 126: Indicates that the payload length is extended to the next 2 bytes.
  • 127: Indicates that the payload length is extended to the next 8 bytes.
  • ≀ 125: The payload length is as stated.

go
switch payloadLen { case 126: extendedLen := make([]byte, 2) if _, err := io.ReadFull(conn, extendedLen); err != nil { return nil, err } frame.PayloadLen = uint64(binary.BigEndian.Uint16(extendedLen)) case 127: extendedLen := make([]byte, 8) if _, err := io.ReadFull(conn, extendedLen); err != nil { return nil, err } frame.PayloadLen = binary.BigEndian.Uint64(extendedLen) default: frame.PayloadLen = uint64(payloadLen) }

This approach ensures that frames with larger payloads are accurately processed.

6: Reading the Masking Key

If the mask bit is set, the next 4 bytes represent the masking key.

go
if frame.Masked { frame.MaskKey = make([]byte, 4) if _, err := io.ReadFull(conn, frame.MaskKey); err != nil { return nil, err } }
  • Reads 4 bytes for the masking key, which will be used to unmask the payload.

7: Reading and Unmasking the Payload

go
if frame.PayloadLen > 0 { frame.Payload = make([]byte, frame.PayloadLen) if _, err := io.ReadFull(conn, frame.Payload); err != nil { return nil, err } if frame.Masked { for i := range frame.Payload { frame.Payload[i] ^= frame.MaskKey[i%4] } } }
  • Reads the actual data being transmitted.
  • If the message is masked, we must unmask it using the masking key.

Explaination

  • Each byte in the payload is XORed with a byte from the masking key.
  • The masking key is only 4 bytes long, so we cycle through it:

go
frame.Payload[i] ^= frame.MaskKey[i%4]
  • i % 4 ensures we use the correct key byte in a loop.

🧩 WebSocket using raw TCP

  • For the complete WebSocket server implementation using raw TCP, check out the full code here.

go
package tcp import ( "bufio" "crypto/sha1" "encoding/base64" "encoding/binary" "encoding/json" "fmt" "io" "log" "net" "net/http" "strings" "sync" ) type Msg struct { Role string `json:"role"` Content string `json:"content"` } /** * WebSocket Frame. */ type Frame struct { Fin bool // Fin indicates if this is the final fragment in a message. Opcode byte // Opcode defines the interpretation of the payload data. Masked bool // Masked indicates if the payload data is masked. PayloadLen uint64 // PayloadLen specifies the length of the payload data. MaskKey []byte // MaskKey is the masking key used to unmask the payload data. Payload []byte // Payload contains the actual data being transmitted. } /** * * ReadFrame reads a single WebSocket frame from a TCP connection. * * * In the WebSocket protocol, the first byte of a frame contains several important pieces of information. Let's break down the first byte: * FIN bit (1 bit): The Most Significant Bit (MSB) of the first byte (bit 7) indicates whether this is the final fragment in a message. If set to 1, it means this is the final fragment. * RSV1, RSV2, RSV3 bits (3 bits): The next three bits (bits 6, 5, and 4) are reserved for future use. They should be set to 0 unless an extension defines otherwise. These bits are typically not used in standard WebSocket communication. * Opcode (4 bits): The last four bits (bits 3 to 0) of the first byte define the frame's type. For example: * 0x0 (0000): Continuation frame * 0x1 (0001): Text frame * 0x2 (0010): Binary frame * 0x8 (1000): Connection close frame * 0x9 (1001): Ping frame * 0xA (1010): Pong frame */ func ReadFrame(conn net.Conn) (*Frame, error) { frame := &Frame{} firstByte := make([]byte, 1) if _, err := io.ReadFull(conn, firstByte); err != nil { return nil, err } /** * Note: 1 Byte is 8 bits. * Anything bitwise ( & ) with above will be either 0x80 or 0 * 0x80 -> 1000 0000 * 0x0F -> 0000 1111 * * frame.Fin extracts the FIN bit to determine if this is the final fragment. * frame.Opcode extracts the last four bits to determine the frame type. * The reserved bits (RSV1, RSV2, RSV3) are not used in this code, which is typical unless you are implementing or using WebSocket extensions that require these bits. */ frame.Fin = (firstByte[0] & 0x80) != 0 // Determines whether the MSB is 1. frame.Opcode = firstByte[0] & 0x0F // Determines the right 4 bits from first byte. log.Printf("Fin: %v \n", frame.Fin) secondByte := make([]byte, 1) if _, err := io.ReadFull(conn, secondByte); err != nil { return nil, err } /** * Note: 1 Byte is 8 bits. * Anything bitwise ( & ) with above will be either 0x80 or 0 * 0x80 -> 1000 0000 * 0x7F -> 0111 1111 * * frame.Masked extracts the MASK bit to determine if the payload data is masked. * payloadLen extracts the last seven bits to determine the payload length. */ frame.Masked = (secondByte[0] & 0x80) != 0 // Determine the whether the MSB is 1 payloadLen := secondByte[0] & 0x7F // Read extended payload length if necessary switch payloadLen { case 126: // 0111 1110 -> 0x7E extendedLen := make([]byte, 2) if _, err := io.ReadFull(conn, extendedLen); err != nil { return nil, err } frame.PayloadLen = uint64(binary.BigEndian.Uint16(extendedLen)) case 127: // 0111 1111 -> 0x7F extendedLen := make([]byte, 8) if _, err := io.ReadFull(conn, extendedLen); err != nil { return nil, err } frame.PayloadLen = binary.BigEndian.Uint64(extendedLen) default: frame.PayloadLen = uint64(payloadLen) } /** * Note: 1 Byte is 8 bits. * Anything bitwise ( & ) with above will be either 0x80 or 0 * * frame.MaskKey extracts the MASK KEY bit. * MASK KEY is of 4 byte. * */ if frame.Masked { frame.MaskKey = make([]byte, 4) if _, err := io.ReadFull(conn, frame.MaskKey); err != nil { return nil, err } } // Read payload if frame.PayloadLen > 0 { frame.Payload = make([]byte, frame.PayloadLen) if _, err := io.ReadFull(conn, frame.Payload); err != nil { return nil, err } /** * Certainly! Let's break down the operation in the provided Go code: * * Explanation * frame.Payload[i]: * * This accesses the i-th element of the Payload slice (or array) * within the frame struct. The Payload is likely a byte slice ([]byte). * frame.MaskKey[i%4]: * * This accesses an element of the MaskKey slice (or array) within the frame struct. * The index used here is i%4, which means the index is the remainder of i divided by 4. * This ensures that the index cycles through 0, 1, 2, and 3, regardless of how large i gets. * ^= (XOR assignment operator): * * The ^= operator performs a bitwise XOR operation between the left-hand side and * the right-hand side, and then assigns the result back to the left-hand side. * In this case, it XORs frame.Payload[i] with frame.MaskKey[i%4] and stores the * result back in frame.Payload[i]. * * Context * This operation is commonly used in WebSocket implementations for masking and * unmasking data frames. The WebSocket protocol specifies that payload data must * be XORed with a masking key to obscure the data being transmitted. * * Example * Let's say frame.Payload is [0x01, 0x02, 0x03, 0x04] and * frame.MaskKey is [0xAA, 0xBB, 0xCC, 0xDD]. For i = 0, the operation would be: * * After the operation, frame.Payload would be [0xAB, 0x02, 0x03, 0x04]. * * This process would repeat for each element in frame.Payload, cycling through the MaskKey. * * Summary * The line of code is performing a bitwise XOR operation between each byte of the Payload and * a corresponding byte from the MaskKey, cycling through the MaskKey every 4 bytes. * This is typically used for encoding or decoding data in WebSocket frames. */ if frame.Masked { for i := range frame.Payload { frame.Payload[i] ^= frame.MaskKey[i%4] } } } return frame, nil } // OpcodeName returns the string representation of the opcode func (f *Frame) OpcodeName() string { switch f.Opcode { case 0x0: return "continuation" case 0x1: return "text" case 0x2: return "binary" case 0x8: return "close" case 0x9: return "ping" case 0xA: return "pong" default: return "unknown" } } func NewServer(wg *sync.WaitGroup) { defer wg.Done() listener, err := net.Listen("tcp", fmt.Sprintf(":%d", port)) if err != nil { log.Println("Error starting WebSocket server:", err) return } defer listener.Close() log.Printf("WebSocket Server running on port %d\n", port) for { conn, err := listener.Accept() if err != nil { log.Println("Error accepting WebSocket connection:", err) continue } go handleWebSocket(conn) } } func handleWebSocket(conn net.Conn) { defer conn.Close() // Step 1: Perform WebSocket handshake reader := bufio.NewReader(conn) request, err := http.ReadRequest(reader) if err != nil { log.Println("Error reading HTTP request:", err) return } // Validate WebSocket handshake if !strings.Contains(request.Header.Get("Connection"), "Upgrade") || request.Header.Get("Upgrade") != "websocket" { log.Println("Invalid WebSocket handshake") return } // WebSocket handshake response key := request.Header.Get("Sec-WebSocket-Key") acceptKey := generateWebSocketAcceptKey(key) response := fmt.Sprintf( "HTTP/1.1 101 Switching Protocols\r\n"+ "Upgrade: websocket\r\n"+ "Connection: Upgrade\r\n"+ "Sec-WebSocket-Accept: %s\r\n\r\n", acceptKey, ) _, err = conn.Write([]byte(response)) if err != nil { log.Println("Error sending handshake response:", err) return } log.Println("WebSocket handshake completed") // Step 2: Handle WebSocket frames for { frame, err := ReadFrame(conn) if err != nil { if err == io.EOF { log.Println("Client disconnected") } else { log.Println("Error reading WebSocket frame:", err) } return } log.Printf("Received frame type: %s", frame.OpcodeName()) if len(frame.Payload) > 0 { log.Printf("Payload: %s", string(frame.Payload)) } // Handle different frame types switch frame.OpcodeName() { case "close": log.Println("Closing connection") return case "ping": log.Println("Received ping") case "pong": log.Println("Received pong") case "text": var msg Msg err := json.Unmarshal(frame.Payload, &msg) if err != nil { log.Println("Error parsing JSON:", err) continue } log.Printf("Received message: %s", msg.Content) response := Msg{Role: "agent", Content: "Message Recieved"} responseJSON, _ := json.Marshal(response) sendFrame(conn, responseJSON) } } } func sendFrame(conn net.Conn, payload []byte) { frame := &Frame{ Fin: true, Opcode: 0x1, // Text frame PayloadLen: uint64(len(payload)), Payload: payload, } header := []byte{0x81} // FIN + Text frame opcode if frame.PayloadLen <= 125 { header = append(header, byte(frame.PayloadLen)) } else if frame.PayloadLen <= 65535 { header = append(header, 126) extendedLen := make([]byte, 2) binary.BigEndian.PutUint16(extendedLen, uint16(frame.PayloadLen)) header = append(header, extendedLen...) } else { header = append(header, 127) extendedLen := make([]byte, 8) binary.BigEndian.PutUint64(extendedLen, frame.PayloadLen) header = append(header, extendedLen...) } conn.Write(header) conn.Write(payload) } func generateWebSocketAcceptKey(key string) string { h := sha1.New() h.Write([]byte(key + "258EAFA5-E914-47DA-95CA-C5AB0DC85B11")) return base64.StdEncoding.EncodeToString(h.Sum(nil)) } const port = 4443

πŸš€ Building a Scalable WebSocket Architecture

When designing a WebSocket-based system, scalability isn’t just a buzzwordβ€”it’s a necessity. A poorly structured WebSocket server can quickly become a bottleneck, struggling under heavy load and leading to performance degradation. That’s why it’s crucial to architect the system in a way that efficiently handles multiple connections while keeping the codebase clean and maintainable.

Breaking Down the Architecture

WS Server- Manages TCP Listener- Accept Connections- Event PollingConnections- Manages WS Protocol- Read/Writes frames- Maintains ConnectionsHandlers- Business Logic- Message Processing- State Management


The WebSocket server can be thought of as the backbone of real-time communication. But instead of cramming everything into one giant module, we break it down into distinct components that each serve a specific role:

  • πŸ–₯️ WS Server The WebSocket server is responsible for managing the TCP listener, accepting connections, and handling event polling. This ensures that incoming messages are efficiently processed without blocking other operations.

  • πŸ› οΈ Handlers Handlers contain the core business logic, message processing, and state management. Keeping this separate from the WebSocket server means you can scale your business logic independently without affecting the underlying connection management.

  • πŸ”„ Connections Each WebSocket connection follows the protocol, reads/writes frames, and maintains a persistent session. This abstraction ensures that connections are efficiently managed, making it easier to implement features like auto-reconnect, heartbeats, and load balancing.

Adding Abstraction for Scalability

To take it a step further, we introduce a layer of abstraction that keeps the WebSocket server loosely coupled with specific implementations. This includes:


ConnectionHub / WebSocket Handler InterfaceUserChat Handler- Manage User State- Process Message- Broadcast User- OnConnect()- OnMessage()- OnClose()ABSTRACTIONWS Server- WS Handler Pointer- WritePong()- Setup Workerpool- Setup Poller- Setup Connection Handler


  • ✨ WebSocket Handler Interface Defines lifecycle methods like OnConnect(), OnMessage(), and OnClose(), allowing different modules (such as a chat system or a live dashboard) to plug in their own logic without modifying the core WebSocket server.

  • πŸ—£οΈ Chat Handler (or any domain-specific handler) Manages user state, processes messages, and broadcasts updates. This modularity ensures that multiple real-time features can be developed and scaled separately.

  • πŸ”— Connection Management With a dedicated connection handler, we maintain WebSocket pointers, handle pings/pongs, and ensure a smooth user experience.

Why This Architecture Scales Well?

  • Separation of Concerns – By decoupling the WebSocket server from business logic, we make it easier to maintain, test, and extend features independently.

  • Improved Performance – The architecture ensures that incoming messages are handled efficiently without blocking the main thread.

  • Domain-Specific Scalability – Whether it’s a chat system, notifications, or live updates, each domain can be optimized separately.

  • Easier Load Balancing – Since the WebSocket connections and handlers are modular, they can be distributed across multiple servers using techniques like sticky sessions or WebSocket brokers (e.g., Redis, NATS).


By structuring WebSocket services this way, we ensure they can handle thousands (or even millions) of connections without breaking a sweat. πŸš€ Curious to see the implementation in action? Check it out here.

πŸš€ Scaling with Goroutines & Channels

Scaling WebSockets isn’t as straightforward as scaling traditional HTTP applications because WebSockets maintain persistent connections. That means we can’t just spin up more stateless instances to handle additional loadβ€”we need a smarter approach. 🧠

⚑ The Vertical Scaling Challenge

  • First, let’s talk about vertical scaling. The goal here is simple: keep CPU and memory utilization as low as possible while efficiently handling millions of WebSocket connections. πŸ’Ύ
  • At first glance, it might seem logical to dedicate a separate goroutine to each WebSocket connection. After all, Go’s goroutines are lightweight, and this would allow us to utilize all CPU cores effectively. πŸ—οΈ

  • But there’s a catchβ€”each goroutine requires its own memory stack, typically 2 to 8 KB. If we assume an average of 4 KB per goroutine, handling 3 million WebSocket connections would require 12 GB of memoryβ€”and that’s just for goroutines! πŸ’₯ Add in buffers for reading and writing messages, and the memory footprint grows significantly. πŸ“ˆ

And we haven’t even built the actual application yet! πŸ˜…

πŸ› οΈ Optimizing Goroutines & Buffers

  • So how do we reduce this memory footprint? Instead of creating a dedicated goroutine for every connection, we can use a worker pool. 🎑 Rather than each WebSocket connection spinning up its own read/write goroutines, we assign connections to a pool of worker goroutines that handle incoming and outgoing messages efficiently.

  • We can further optimize memory usage by reusing buffers instead of allocating new ones for every message. πŸ”„ By leveraging polling mechanisms, such as epoll on Linux, we can wake up worker goroutines only when there’s actual data to process. πŸ‹οΈβ€β™‚οΈ This prevents thousands of goroutines from idling and consuming unnecessary memory. 🧡

🎯 The Takeaway

Scaling WebSockets is tricky because of persistent connections and the need for efficient resource management. A naΓ―ve approachβ€”one goroutine per connectionβ€”leads to huge memory consumption. Instead, we can scale efficiently by:

  • βœ… Using worker pools instead of dedicated goroutines per connection.
  • βœ… Reusing buffers to minimize unnecessary allocations.
  • βœ… Leveraging polling mechanisms to process connections only when needed.

This approach ensures that we maximize resource utilization while keeping our WebSocket server scalable and performant. πŸš€ Want a deep dive into implementation? Check out below reference for more details.

πŸš€ Resources