Writing a server and getting it on the internet can be pretty complicated by itself. But what happens once people start using your service: how do you update your running server without dropping requests?

You can’t just turn things off and turn them back on again like in development, especially if your service provides a critical utility that users expect to be working, because you’ll drop in-flight requests as soon as you turn things off until your service comes back up.

Here’s what that would look like:

(requests are shown as arrows above server states)

You can see how any requests that overlap with the intermediate transition phase of shutting down the old server and bringing up the new one will fail.

Since we want to eliminate that intermediate transition phase, where no server is running, what if we brought up server version 2 up before we shutdown version 1? We can achieve this by putting both versions behind the load balancer at the same time so that requests are sent to both. Let’s see what that would look like:

In-flight requests to server version 1 still fail when it is shutdown.

This looks a little better: we no longer have a period where there is no server running, but we still see failed requests. Why? Well, any request that was in-flight to server version 1 when it shutdown will obviously get cut short. We also have another problem, which is that server version 1 will keep accepting new requests as long as it remains behind the load balancer.

So what we need is a way to tell server version 1 to finish any in-flight requests without accepting any new ones. That way we can have both servers in the load balancer at the same time without server version 1 getting half the traffic, but it can still finish any in-flight requests while server version 2 handles any new requests.

This is where graceful shutdown comes in — it’s a way of telling a running server, “Hey, you’re about to shut down, stop accepting work and finish up anything you have remaining”. This is typically done by sending a termination signal (SIGTERM) to the server, which is responsible for then rejecting new requests and finishing any in-flight ones, then shutting down. Most servers support this, for example, the built-in go HTTP server has a Shutdown() method, and the gRPC go server has a GracefulStop() method.

One additional wrinkle is that the server needs to communicate to the load balancer that it is no longer accepting new traffic, and it does this by failing its readiness check. A readiness check is just a way for the load balancer to check if the server is ready to receive requests. It’s typically a path on the server like “/readyz” that will be called every few seconds as long as the server is in the load balancer. While it returns success, the server will get traffic, when it returns failure, the load balancer will stop sending new traffic.

So, when a server receives a termination signal, it should begin failing its readiness check as well as initiating a graceful shutdown. Here’s what that looks like:

All requests finish, the update goes smoothly.

Now we can see server version 1 finishes in-flight requests, while server version 2 smoothly comes in to accept new requests, and nobody is left hanging.

That’s graceful shutdown in a nutshell. It adds a bit of complexity but most serious server libraries will support it out of the box, although it’s not typically the default behavior so you have to do a bit of work to enable it.

Why bother with any of this? Well, if you’re running a service that people care about, you don’t want to drop their requests on the floor. It’s the right thing to do. But it’s also a prerequisite for running your service on something like preemptible instances, which can be restarted at any time. Preemptible instances cost only about half the price of normal instances, so doing a little work to support graceful shutdown can allow you to save some money on compute.

14k gold slum computer wizard