WebSockets 101

Intro

Disclaimer: I didn’t know much about Websockets 1 week ago, all the experience I had with Websockets was when I developed a chat application back in 2016 using a JS framework that tried to be a Ruby on Rails implementation called SailsJS, so I decided to research about this technology and consumed multiple resources which I will link in this blog post and each section.

Websockets are a way to handle full-duplex communications (or two-way communications), they are very useful to build applications that needs real time data like chat application or a stock dashboard. Websockets arrived as an improvement over previous solutions like:

Polling

The most simple solution consists of making HTTP requests at a set interval

Long Polling

Desperate Times Call for Desperate Measures

This just means that the client will call the HTTP endpoint and the server will not resolve the request immediately but wait until a message needs to be delivered to the client, this approach worked but had a lot of disadvantages, like timeouts, latency, and it sounds very hacky to begin with.

HTTP streaming and Server-sent events (SSE)

These solutions work well but are more suited to one-way communications as in use cases where the server needs to send a notification to its user, e.g.: A new post was created, etc.

Websocket servers can (but don’t have to) be run alongside a normal HTTP server since WebSockets use a different protocol ws and wss (for secured connections like HTTPS).

Websockets are a simple protocol from the user’s perspective as they only consist of 3 different events:

open
close
message

Connecting (Handshake)

All WebSocket connections will start with a handshake, always initiated by the client, that if successful will upgrade the connection from HTTP to ws/wss protocol.

Notes:

A client can create as many connections as it desires.
Arbitrary headers cannot be set in the browser when setting up a WebSocket connection.

All the behavior below is defined at the browser level by W3’s HTML5 WebSocket API specification and at the protocol level via RFC 6455 “The WebSocket Protocol” and is of course all managed by the system (the browser in JavaScript) and you don’t need to worry about it, the following is for information purpose only

The handshake process consists of the following steps:

Client sends the following Headers on a standard http(s) request

HTTP GET ws://127.0.0.1:8000/ 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: hXcK3GM6B2r6yj/4L0Vuqw==
Origin: http://localhost:3000
Sec-WebSocket-Version: 13

Sec-WebSocket-Key

This header is a random string, the server will take these bytes and appends the special string 58EAFA5-E914-47DA-95CA-C5AB0DC85B11, hashed it and encode it in base64 and return in the Sec-WebSocket-Accept header.

If the server accepts the connection, it will respond with the following headers:

HTTP GET ws://127.0.0.1:8000/ 101 Switching Protocols
Connection: Upgrade
Sec-WebSocket-Accept: rXzSb8mhB4ljxko8kbyiCohJ4Fc=
Upgrade: websocket

The connection will be considered as successful only when the server:

Replies with status code 101
Includes Connection Header with value Upgrade
Includes Upgrade Header with value websocket

Example in JavaScript

All of that complexity is already handled by the browser, and establishing a WebSocket connection is quite straightforward:

const socket = new WebSocket('wss://example.com/socket');

socket.addEventListener('open', (event) => {
  console.log('WebSocket connection established!');
});

Message exchange

Once a connection has been established, it is kept alive both in the server and the client.

Both the server and the client can send messages at any moment in either text or binary data (Blob or ArrayBuffer objects). Sending messages in binary data is marginally faster but not by a big factor since sending text data requires a UTF-8 conversion but this process is quite fast nowadays.

Example in Javascript

Sending messages is as simple as:

socket.send(message); // e.g: '{a: 1]'

And receiving them:

socket.addEventListener('message', (event) => {
  console.log('Received message:', event.data);
});

Closing

A WebSocket connection can be closed at any moment by either the client or the server.

From the client’s perspective

it’s very straightforward:

socket.close();

From the server:

It can be a bit trickier since there might be problems if you wait for the clients to disconnect, clients might hang and eventually will cause performance issues. Two ways to address this is by using the following code (that was taken from Stackoverflow):

Hard shutdown

With this approach, we just terminate the connection without waiting for clients to disconnect

// Soft close
socket.close();

process.nextTick(() => {
if ([socket.OPEN, socket.CLOSING].includes(socket.readyState)) {
    // Socket still hangs, hard close
    socket.terminate();
}
});

Soft shutdown

Here, we give the clients some time to disconnect before terminating them

// First sweep, soft close
wss.clients.forEach((socket) => {
  socket.close();
});

setTimeout(() => {
  // Second sweep, hard close
  // for everyone who's left
  wss.clients.forEach((socket) => {
    if ([socket.OPEN, socket.CLOSING].includes(socket.readyState)) {
      socket.terminate();
    }
  });
}, 10000);

Auth and security

Unfortunately, the WebSocket protocol doesn’t suggest a way to handle authentication, but many people have come up with many solutions over time.

As said before, the most traditional way to authenticate in the modern web by sending a Header with a token like a JWT is not possible with WebSockets since sending a custom header is not supported in the browser.

This leaves us with solutions like the one presented in this document:

Sending credentials as the first message in the WebSocket connection

This method is fully reliable but moves the authentication to the application layer and can expose you to leak information if you’re not careful enough. Another negative point is that it allows everyone to open WebSocket connections with your server.

Adding credentials in the WebSocket URI as a query parameter

Another method is to send tokens as a query parameter when opening the WebSocket connection, something like: wss://localhost:3000/ws?token=myToken The downside of this is that this information might end up in the logs of your system, one way to mitigate this would be to use single-use tokens, but the industry tends to consider this risk as unacceptable.

Setting a cookie on the domain of the WebSocket URI

This solution is also reliable as long as your WebSocket server is running in the same domain as your http server if this is not the case then it won’t be possible since the Same-Origin Policy doesn’t allow setting a cookie on a different origin.

But there are two ways to overcome this problem:

Move the WebSocket server to a subdomain of the main http server, e.g: websocket.example.com
Use an iframe running on the same WebSocket domain to set the cookie

If you go with this approach you need to consider that as today [2023-07-18 Tue] Google Chrome won’t work with this approach unless you set the SameSite and Secure properties as well, e.g. document.cookie = 'my_token=token; SameSite=None; Secure;'

WebSocket Auth and Security

Cross-Site WebSocket Hijacking

WebSocket Cross One thing to be aware of cookie-based authentication solution is that Websocket connections are not restrained by the same-origin policy, and thus it opens a vector attack called Cross-Site WebSocket Hijacking.

This means that if you set a cookie for the websocket domain this cookie will be sent when connecting to the server no matter from which website the WebSocket connection is established, this opens a security problem since malign websites can take advantage that the user already has a cookie set for the domain to also subscribe to messages with the WebSocket server.

One way to mitigate this problem is to always validate the Origin of the client before establishing the connection.

You can much more information in this article

Resources

Binary vs text messages

https://stackoverflow.com/questions/7730260/binary-vs-string-transfer-over-a-stream

Shutting down WebSocket connections in the server

https://stackoverflow.com/questions/41074052/how-to-terminate-a-websocket-connection/49791634#49791634

Authentication in WebSockets

https://websockets.readthedocs.io/en/stable/topics/authentication.html

Cross-Site WebSocket Hijacking

https://christian-schneider.net/CrossSiteWebSocketHijacking.html

Websocket framed protocol

https://sookocheff.com/post/networking/how-do-websockets-work/