How to Reconnect Web Sockets in a Realtime Web App without Flooding the Server
Realtime web applications is has been a growing trend lately thanks to the technology behind it maturing in the major browsers. Over the next few years, it looks like more and more web apps are going to have realtime features in their UIs. As a developer, this is going to introduce some interesting challenges for us.
One such challenge is what should your app do when your backend becomes unreachable and your WebSocket gets closed. Generally, the best approach here is to gracefully turn off some features and get your application to start trying to reopen the WebSocket connection. Usually this results in some simple code that tries to create a new connection every few seconds:
As a client developer, this logic makes a lot of sense and is simple to implement. However, there is a major problem here - every single client uses the same timing logic. This is bad news because of what will happen when your backend service finally comes back online; if everyone is polling at the same interval then they will all try to open a new connection to the backend at close to the same time. For example, say you have a large office (say a few thousand people) that has your web app running the entire day and their web socket connection is lost at the same time. Using the logic above, when the server is accessible again you will have a few thousand clients trying to create a WebSocket connection at exactly the same time. This will result in potentially flooding the server with requests and taking it down again.
A way to solve this problem is to use the Exponential Backoff algorithm. The benefit of this method is that it spreads out the reconnection attempts among the browsers to random intervals instead of everyone reconnecting to the server at the same time.
The algorithm works like so:
- For k attempts, generate a random interval of time between 0 and 2^k - 1.
- If you are able to reconnect, reset k to 1
- If reconnection fails, k increases by 1 and the process restarts at step 1.
- To truncate the max interval, when a certain number of attempts k has been reached, k stops increasing after each attempt.
The logic basically works like so:
- When the WebSocket connection is closed, create a random time interval between 0 and 1 second.
- If the connection is unable to re-open, increase the amount of reconnect tries by 1 and generate a new random interval. The interval this time will be a random value between 0 and 3 seconds.
- Continue steps 1 and 2 until that maximum possible interval reaches 30 seconds. At this point, all future intervals will be a random value between 0 and 30 seconds.
- Once connection has been re-established, reconnection attempts is reset to 1.
With realtime web apps becoming commonplace, maintaining your network connection to the backend services is an important technique to understand. Fortunately, this problem has already been solved for us and we can utilize a useful technique like Exponential Backoff to help us with this and not flood our servers with connection attempts.