1

I'm trying to find an optimal way to handle ongoing PostgreSQL transactions during the shutdown of a golang server running on Kubernetes.

Does it make sense to wait for transactions to finish, when these transaction are serving requests initiated by a server that has already shutdown? And even if the transaction completes within the graceful shutdown timeout duration - will the server be able to send the response?

Even if responding to ongoing requests during shutdown is not possible, I prefer to cancel the context of all running transaction so they don't continue to run on the database after the server terminates, adding unnecessary load. But whenever I wait for transactions to finish, it seems there's a trade-off: The longer I wait for ongoing transactions to finish - the longer the container exists with a non responsive server that would error on each incoming request.

Here's some sample code that demonstrates this:

import (
    "github.com/jackc/pgx/v5/pgxpool"
    "os/signal"
    "context"
    "net/http"
    "syscall"
    "time"
)

func main() {
    ctx, cancel := signal.NotifyContext(context.Background(), syscall.SIGTERM, syscall.SIGQUIT, syscall.SIGINT)
    defer cancel()

    // db is used by the API handler functions
    db, err := pgxpool.NewWithConfig(ctx, <some_config>)
    if err != nil {
        logger.Error("server failed to Shutdown", err)
    }


    server := http.Server{<some_values>}
    serverErr := make(chan error)
    go func() {
        serverErr <- server.ListenAndServe()
    }()

    select {
    case <-ctx.Done():
        if err := Shutdown(closeCtx, time.Second*10, server, db); err != nil {
            logger.Error("server failed to Shutdown", err)
        }

    case err := <-serverErr:
        logger.Error("server failed to ListenAndServe", err)
    }
}

func Shutdown(ctx context.Context, timeout time.Duration, server *http.Server, db *pgxpool.Pool) error {

    closeCtx, cancel := context.WithTimeout(ctx, timeout)
    defer cancel()

    // first, shutdown the server to stop accepting new requests
    if err := server.Shutdown(closeCtx); err != nil {
        return err
    }

    // allow running transactions to finish, but if they don't finish within
    // ten seconds, cancel the context of all running transactions so that they
    // are forced to finish (albeit, with error)
    transactionsComplete := waitForTransacitons(time.Second*10, db)
    if !transactionsComplete {
        cancelContextOfEveryTransaction()
    }
    
    // since this call blocks until all transactions finished we must call it
    // only once we are sure that there are no more running transactions.
    db.Close(ctx)

    return nil
}

Would the optimal graceful termination sequence be:

  • Shutdown the server.
  • Immediately cancel context of all ongoing requests (killing the transaction as soon as the database driver tries to do anything with it).
  • Close the connection pool.
  • Exit.

[edit]: alternative termination sequence (more graceful):

  • Termination signal is received.
  • The pod is in 'terminating' state and is removed from the load balancer.
  • Shutdown the server with some timeout N.
  • Shutdown the connection pool - with a short timeout. Reasoning: since server.Shutdown returned, no responses will be returned. The only reason to wait for ongoing transactions is for background workers to finish their work, such as writing logs to the database.
  • If there are still open transaction that prevent the connection pool from closing - kill these transactions and try to close the pool again.
  • Exit.
Alechko
  • 1,406
  • 1
  • 13
  • 27
  • 1
    An option would be to set the `server.BaseContext` func. Then, you can simply cancel that base context, which will cancel all active handlers. If you use the HTTP request context for your db operations, those should cancel as well. – Burak Serdar Aug 18 '23 at 16:35
  • 2
    If you cancel requests immediately you can hardly call that a graceful shutdown. The first step should be to take the server out of the loadbalancing. I'm not too farmiliar with k8s, but I imagine you have health checks in place that can start failing immediately, causing new requests to be routed elsewhere. Then you can call Server.Shutdown with a grace period to let inflight requests complete, and only after Shutdown returns should you forcefully cancel any remaining request contexts. – Peter Aug 18 '23 at 17:07
  • Indeed the pod is removed from load balancer whenever as soon as it moves to 'terminating' state. and the `server.Shutdown` call is blocking until timeout is reached, or all ongoing requests are finished. I've added a suggestion to the original question to reflect what seems as the most graceful way to act on pod termination. – Alechko Aug 21 '23 at 10:19

1 Answers1

0

Why reinventing the wheel and not using some of the existing libraries, that do the magic for you?

In our production services, we have used this graceful shutdown lib a lot and never had issues with it. It waits until all HTTP requests are served (within given timeout) and shuts down afterwards.

The usage couldn't be simpler. After installing it

go mod download github.com/TV4/graceful

(eventually:

go get -u github.com/TV4/graceful

)

you only need to import it:

import (
    // ...

    "github.com/TV4/graceful"
)

and then you can replace all your code after instantiating a server (including your Shutdown function) with this one-liner:

server := ...
graceful.LogListenAndServe(server, logger)
shadyyx
  • 15,825
  • 6
  • 60
  • 95
  • My reasoning is a perhaps exaggerated need for control and insight into the application flow and lifecycle. Mainly to feel more comfortable around production issues - that is, I know how the system works so it's easier to deduce how some behavior comes to be. Additionally, my main question is graceful shutdown of server AND database, and when should we apply forceful transaction termination. The library you suggest seems to mainly address server shutdown. – Alechko Aug 22 '23 at 15:29