3

I'm trying to stop all clients connected to a stream server from server side. Actually I'm using GracefulStop method to handle it gracefully.

I am waiting for os.Interrupt signal on a channel to perform a graceful stop for gRPC. but it gets stuck on server.GracefulStop() when the client is connected.

func (s *Service) Subscribe(_ *empty.Empty, srv clientapi.ClientApi_SubscribeServer) error {
    ctx := srv.Context()

    updateCh := make(chan *clientapi.Update, 100)
    stopCh := make(chan bool)
    defer func() {
        stopCh<-true
        close(updateCh)
    }

    go func() {
        ticker := time.NewTicker(1 * time.Second)
        defer func() {
            ticker.Stop()
            close(stopCh)
        }
        for {
            select {
            case <-stopCh:
                return
            case <-ticker.C:
                updateCh<- &clientapi.Update{Name: "notification": Payload: "sample notification every 1 second"}
            }
        }
    }()

    for {
        select {
        case <-ctx.Done():
            return ctx.Err()

        case notif := <-updateCh:
            err := srv.Send(notif)
            if err == io.EOF {
                return nil
            }

            if err != nil {
                s.logger.Named("Subscribe").Error("error", zap.Error(err))
                continue
            }
        }
    }
}

I expected the context in method ctx.Done() could handle it and break the for loop. How to close all response streams like this one?

colm.anseo
  • 19,337
  • 4
  • 43
  • 52
rezamm
  • 71
  • 1
  • 2
  • 9

2 Answers2

8

Create a global context for your gRPC service. So walking through the various pieces:

  • Each gRPC service request would use this context (along with the client context) to fulfill that request
  • os.Interrupt handler would cancel the global context; thus canceling any currently running requests
  • finally issue server.GracefulStop() - which should wait for all the active gRPC calls to finish up (if they haven't see the cancelation immediately)

So for example, when setting up the gRPC service:

pctx := context.Background()
globalCtx, globalCancel := context.WithCancel(pctx)

mysrv := MyService{
    gctx: globalCtx
}

s := grpc.NewServer()
pb.RegisterMyService(s, mysrv)

os.Interrupt handler initiates and waits for shutdown:

globalCancel()
server.GracefulStop()

gRPC methods:

func(s *MyService) SomeRpcMethod(ctx context.Context, req *pb.Request) error {

    // merge client and server contexts into one `mctx`
    // (client context will cancel if client disconnects)
    // (server context will cancel if service Ctrl-C'ed)

    mctx, mcancel := mergeContext(ctx, s.gctx)

    defer mcancel() // so we don't leak, if neither client or server context cancels

    // RPC WORK GOES HERE
    // RPC WORK GOES HERE
    // RPC WORK GOES HERE

    // pass mctx to any blocking calls:
    // - http REST calls
    // - SQL queries etc.
    // - or if running a long loop; status check the context occasionally like so:

    // Example long request (10s)
    for i:=0; i<10*1000; i++ {
        time.Sleep(1*time.Milliscond)

        // poll merged context
        select {
            case <-mctx.Done():
                return fmt.Errorf("request canceled: %s", mctx.Err())
            default:
        }
    }
}

And:

func mergeContext(a, b context.Context) (context.Context, context.CancelFunc) {
    mctx, mcancel := context.WithCancel(a) // will cancel if `a` cancels

    go func() {
        select {
        case <-mctx.Done(): // don't leak go-routine on clean gRPC run
        case <-b.Done():
            mcancel() // b canceled, so cancel mctx 
        }
    }()

    return mctx, mcancel
}
colm.anseo
  • 19,337
  • 4
  • 43
  • 52
  • FWIW the check of s.gctx at the top of the service handler isn't necessary. `GracefulStop` will cause the server to stop accepting new requests entirely. – Doug Fawley Oct 21 '19 at 16:46
  • @DougFawley good point. Updated answer. Also fixed a potential context leak in `mergeContext`. – colm.anseo Oct 21 '19 at 18:18
  • @colminator what if I don't merge contexts and have a `case` for each? does that cause a context leak? – rezamm Oct 22 '19 at 06:25
  • The reason to merge them into a single context, is so you can pass a single context to any other potentially blocking call like a HTTP request or SQL query. Those types of call can only take a single context value. And you want those calls to detect any cancellations that your RPC may receive - client or server. – colm.anseo Oct 22 '19 at 09:55
  • 1
    @colminator I think your select inside the goroutine in `mergeContext` also needs a case for `mctx.Done()`. Otherwise that goroutine will leak even when `mcancel` is called. – Doug Fawley Oct 22 '19 at 19:56
  • @DougFawley I think the client context will be reclaimed on calls which would trigger the `mctx` to clean-up. But just in case I update to your suggestion. I also condensed it some more - as I noticed I could make `a` the parent context of `mctx` - and then only have to monitor `b` (and mctx of course). – colm.anseo Oct 22 '19 at 20:25
  • @colminator - you are right about the service handler's context being canceled, but if `mergeContext` is intended to be a generic function (for use outside gRPC) then it's possible the children will never be canceled. The update looks good and nice idea making `mctx` a child of `a`. With that change, you could then update the service handler to do: `ctx, cancel := mergeContext(ctx, s.gctx)`, to avoid mistakenly blocking on the un-merged `ctx` directly. (Any values in `ctx` will also be available via `mctx`.) – Doug Fawley Oct 23 '19 at 21:22
1

Typically clients need to assume that RPCs can terminate (e.g. due to connection errors or server power failure) at any moment. So what we do is GracefulStop, sleep for a short time period to allow in-flight RPCs an opportunity to complete naturally, then hard-Stop the server. If you do need to use this termination signal to end your RPCs, then the answer by @colminator is probably the best choice. But this situation should be unusual, and you may want to spend some time analyzing your design if you do find it is necessary to manually end streaming RPCs at server shutdown.

Doug Fawley
  • 953
  • 5
  • 7