I'm trying to find an optimal way to handle ongoing PostgreSQL transactions during the shutdown of a golang server running on Kubernetes.
Does it make sense to wait for transactions to finish, when these transaction are serving requests initiated by a server that has already shutdown? And even if the transaction completes within the graceful shutdown timeout duration - will the server be able to send the response?
Even if responding to ongoing requests during shutdown is not possible, I prefer to cancel the context of all running transaction so they don't continue to run on the database after the server terminates, adding unnecessary load. But whenever I wait for transactions to finish, it seems there's a trade-off: The longer I wait for ongoing transactions to finish - the longer the container exists with a non responsive server that would error on each incoming request.
Here's some sample code that demonstrates this:
import (
"github.com/jackc/pgx/v5/pgxpool"
"os/signal"
"context"
"net/http"
"syscall"
"time"
)
func main() {
ctx, cancel := signal.NotifyContext(context.Background(), syscall.SIGTERM, syscall.SIGQUIT, syscall.SIGINT)
defer cancel()
// db is used by the API handler functions
db, err := pgxpool.NewWithConfig(ctx, <some_config>)
if err != nil {
logger.Error("server failed to Shutdown", err)
}
server := http.Server{<some_values>}
serverErr := make(chan error)
go func() {
serverErr <- server.ListenAndServe()
}()
select {
case <-ctx.Done():
if err := Shutdown(closeCtx, time.Second*10, server, db); err != nil {
logger.Error("server failed to Shutdown", err)
}
case err := <-serverErr:
logger.Error("server failed to ListenAndServe", err)
}
}
func Shutdown(ctx context.Context, timeout time.Duration, server *http.Server, db *pgxpool.Pool) error {
closeCtx, cancel := context.WithTimeout(ctx, timeout)
defer cancel()
// first, shutdown the server to stop accepting new requests
if err := server.Shutdown(closeCtx); err != nil {
return err
}
// allow running transactions to finish, but if they don't finish within
// ten seconds, cancel the context of all running transactions so that they
// are forced to finish (albeit, with error)
transactionsComplete := waitForTransacitons(time.Second*10, db)
if !transactionsComplete {
cancelContextOfEveryTransaction()
}
// since this call blocks until all transactions finished we must call it
// only once we are sure that there are no more running transactions.
db.Close(ctx)
return nil
}
Would the optimal graceful termination sequence be:
- Shutdown the server.
- Immediately cancel context of all ongoing requests (killing the transaction as soon as the database driver tries to do anything with it).
- Close the connection pool.
- Exit.
[edit]: alternative termination sequence (more graceful):
- Termination signal is received.
- The pod is in 'terminating' state and is removed from the load balancer.
- Shutdown the server with some timeout N.
- Shutdown the connection pool - with a short timeout. Reasoning: since
server.Shutdown
returned, no responses will be returned. The only reason to wait for ongoing transactions is for background workers to finish their work, such as writing logs to the database. - If there are still open transaction that prevent the connection pool from closing - kill these transactions and try to close the pool again.
- Exit.