2

I'm implementing graceful termination for a Golang Application deployed on a Kubernetes cluster. I'm using pgxconn.Pool.

My main challenge now is to forcefully kill all queries from the Application that are running on the PostgreSQL server during shutdown.

The Deployment on Kubernetes is defined with terminationGracePeriodSeconds: 30 (default), so unless I kill all queries within 30 seconds the pod will be killed by Kubernetes and the queries will keep running until completion with no one on the other side waiting for the result. This impacts the performance and stability of the database.

In my code I'm catching termination signals and calling Shutdown(ctx context.Context) on my http.Server with a timeout. Once the server has been closed, I'm calling Close() on pgxconn.Pool, but the Close function doesn't have take a context and I don't see any other way to forcefully kill all queries and close connections.

Here's the code in my main() function:

    // The sum of connPoolCloseTimeout and serverCloseTimeout must be less than 30
    // seconds - the time Kubernetes will wait after a SIGTERM before it kills the pod
    // (terminationGracePeriodSeconds)
    const connPoolCloseTimeout, serverCloseTimeout time.Duration = time.Second*10, time.Second*10

    go func() {
        signalChan := make(chan os.Signal, 1)
        signal.Notify(signalChan,
            syscall.SIGTERM,
            syscall.SIGQUIT,
            syscall.SIGINT,
        )
        sig := <-signalChan
        mainLogger.Warn("got signal", zap.String("signal", sig.String()))

        ctx, cancel := context.WithTimeout(context.Background(), serverCloseTimeout)
        defer cancel()
        if err := server.Shutdown(ctx); err != nil && err != http.ErrServerClosed {
            mainLogger.Warn("server Shutdown (forced shutdown due to timeout):", zap.Error(err))
        } else if err != nil {
            mainLogger.Warn("server Shutdown (error):", zap.Error(err))
        } else {
            mainLogger.Info("server Shutdown (clean)")
        }

        //Close waits for all acquired connections to be released
        //Because of this we must add a deadline or this will block forever
        doneChan := make(chan struct{}, 1)
        go func() {
            mainLogger.Info("Closing connection pool. Waiting for all connections to close")
            pool.Close()
            doneChan <- struct{}{}
        }()

        select {
        case <-doneChan:
            //Do nothing and continue
        case <-time.After(connPoolCloseTimeout):
            dbc.logger.Error("closing connection pool by force, deadline exceeded")
            // TODO: In this case, we couldn't manage to close the connection pool in time. The application will be terminated and ongoing queries will not be killed. If there's a method to force kill queries add it here
        }
    }()

Is there a good way (without manually sending RST packets to the PostgreSQL server or similar hacks) to forcefully close all running queries from a connection pool upon termination?

Will I have to execute each query with a Context and call the cancel function of every such context on shutdown, or will this also not help?

[I already have a timeout set for every DB query, but that timeout is rather high to allow long running queries to complete in a normal state of affairs]

Alechko
  • 1,406
  • 1
  • 13
  • 27
  • [This comment](https://github.com/jackc/pgx/issues/802#issuecomment-668713840) from jackc suggests just exiting the program (I cannot see any clean alternatives). – Brits Mar 23 '22 at 21:16
  • @Brits When the program exists, unless on shutdown I call the cancel function of each context from all queries, the currently executed queries will keep running on the database. – Alechko Mar 29 '22 at 09:39
  • When `main` exits all running GoRoutines (and, as such, the connections to the database) will be shutdown. The PostgreSQL server should detect this up and stop the queries. Alternatively calling your `cancel` function (on the outer `Context`) when you get the signal should cancel all active queries (assuming they are passed a context derived from that one). – Brits Mar 29 '22 at 19:09
  • What I observe is that calling `pgxpool.Pool.Close()` is blocking while queries are running (I'm using `pg_sleep(100)` for creating long running queries). Additionally, when the server process is killed the queries to the postgres DB keep running (non graceful shutdown). The only way I found to stop running queries is to cancel their context after a timeout upon receiving SIGTERM and before the pod is killed by k8s. Could be related to what's described in this article: https://engineering.zalando.com/posts/2015/04/how-to-fix-what-you-cant-kill-undead-postgresql-queries.html – Alechko Apr 06 '22 at 10:57
  • "PostgreSQL server should detect this up and stop the queries" - I believe this part just isn't happening (using PG-13). It's easily reproducible by executing a golang program that queries a PG-13 database with `pg_sleep(100);`, then terminate the process with `kill -9 ` and see that the query is still running with `select * from pg_stat_activity;`. – Alechko Apr 06 '22 at 10:59
  • What happens when the connection drops [depends somewhat on the query and other factors](https://dba.stackexchange.com/questions/81408/is-a-postgres-long-running-query-aborted-if-the-connection-is-lost-broken). `pxgpool.Pool.Close` [docs](https://pkg.go.dev/github.com/jackc/pgx/v4/pgxpool#Pool.Close) state that the call "Blocks until all connections are returned to pool and closed." so what you are seeing there is expected. – Brits Apr 06 '22 at 18:51
  • 1
    Note that as mentioned above the approach I'd recommend is to have an application level context that is passed everywhere (you can create child contexts as needed). Cancelling that will advise everything that you are shutting down. – Brits Apr 06 '22 at 20:06

0 Answers0