0

I have a gRPC server, and I have implemented graceful shutdown of my gRPC server something like this

fun main() {
    //Some code
    term := make(chan os.Signal)
    go func() {
            if err := grpcServer.Serve(lis); err != nil {
                term <- syscall.SIGINT
            }
        }()

    signal.Notify(term, syscall.SIGTERM, syscall.SIGINT)
    <-term
    server.GracefulStop()
    closeDbConnections()
}

This works fine. If instead I write the grpcServer.Serve() logic in main goroutine and instead put the shutdown handler logic into another goroutine, statements after server.GracefulStop() usually do not execute. Some DbConnections are closed, if closeDbConnections() is executed at all.

server.GracefulStop() is a blocking call. Definitely grpcServer.Serve() finishes before server.GracefulStop() completes. So, how long does main goroutine take to stop after this call returns?

The problematic code

func main() {
    term := make(chan os.Signal)
    go func() {
        signal.Notify(term, syscall.SIGTERM, syscall.SIGINT)
        <-term
        server.GracefulStop()
        closeDbConnections()
    }()
    if err := grpcServer.Serve(lis); err != nil {
        term <- syscall.SIGINT
    }
}

This case does not work as expected. After server.GracefulStop() is done, closeDbConnections() may or may not run (usually does not run to completion). I was testing the later case by sending SIGINT by hitting Ctrl-C from my terminal.

Can someone please explain this behavior?

uzumas
  • 632
  • 1
  • 8
  • 23
  • 1
    You haven't provided an example of the problematic code, but if you're not waiting for your goroutine to return in `main`, you're going to exit the program before it completes. – JimB Apr 22 '19 at 16:49
  • @JimB the goroutine is just running the server, and `main` doesn't return until `GracefulStop` finishes, so I don't see a problem there. @agyeya, in answer to your question, it takes no time. When `main` returns (at the end of the function), the process exits. – Adrian Apr 22 '19 at 16:53
  • @Adrian: they said "this works fine", it's the non-working case which we appear to not have. This sounds like the basic "main exits before goroutines" problem. – JimB Apr 22 '19 at 16:54
  • Oh I see what you're saying - yes, it would be a problem in the hypothetical (but not shown) code. – Adrian Apr 22 '19 at 16:56
  • Added the problematic code. Just trying to understand the problem with it, i got my use case satisfied with the other code – uzumas Apr 23 '19 at 08:17
  • You’re not waiting for closeDbConnections to run, so it shouldn’t be surprising that it doesn’t run. – JimB Apr 23 '19 at 11:17

1 Answers1

7

I'm not sure about your question (please clarify it), but I would suggest you to refactor your main in this way:

func main() {

   // ...

   errChan := make(chan error)
   stopChan := make(chan os.Signal)

   // bind OS events to the signal channel
   signal.Notify(stopChan, syscall.SIGTERM, syscall.SIGINT)

   // run blocking call in a separate goroutine, report errors via channel
   go func() {
        if err := grpcServer.Serve(lis); err != nil {
            errChan <- err
        }
    }()

   // terminate your environment gracefully before leaving main function
   defer func() {
      server.GracefulStop()
      closeDbConnections()
   }()

   // block until either OS signal, or server fatal error
   select {
      case err := <-errChan:
          log.Printf("Fatal error: %v\n", err) 
      case <-stopChan:
   }

I don't think it's a good idea to mix system events and server errors, like you do in your example: in case if Serve fails, you just ignore the error and emit system event, which actually didn't happen. Try another approach when there are two transports (channels) for two different kind of event that cause process termination.

Vitaly Isaev
  • 5,392
  • 6
  • 45
  • 64
  • Hi, I have added the problematic code. Let me know if there seems to be an gap in my understanding regarding goroutines execution here. – uzumas Apr 23 '19 at 04:11
  • @agyeya I would suggest you to study code sample that I provided above and compare it with yours. This is how we do it in our backend. This approach is well tested and reliable. – Vitaly Isaev Apr 23 '19 at 07:15