Why gRPC client side cancellation only works when both client and server run on the same process?

Question

I have an async server that streams data to a single async client. I would like to be able to cancel the streaming from the client side and have the server stop streaming.

Currently, I run both client and server on 2 separate processes on my local Windows 10 machine. But I have tried running the client on a separate machine and it behaves the same.

My server side endpoint is configured like so:

const auto server_grpc_port = "50051";
const auto server_endpoint = std::string("0.0.0.0:") + server_grpc_port;
serverBuilder.AddListeningPort(server_endpoint, grpc::InsecureServerCredentials());

My client side endpoint is configured like so:

const auto server_grpc_port = "50051";
const auto client_endpoint = std::string("localhost:") + server_grpc_port;
remoteStub = std::make_unique<MyRemoteApp::Stub>(grpc::CreateChannel(client_endpoint, grpc::InsecureChannelCredentials()));

After I start both client and server, I initiate an asynchronous server streaming. At some point, I trigger cancellation from the client side which should cause the client to stop reading and the server to stop writing. I follow the method described in this answer here and github issue here:

Server Side

Create a grpc::ServerContext instance
Call grpc::ServerContext::AsyncNotifyWhenDone(cancellation_tag). Once the cancellation_tag will appear on the completion queue, We may invoke grpc::ServerContext::IsCancelled() to determine if a client has cancelled the RPC.
Wait the RPC streaming to be initiated by the client: server->RequestMyStreamingRPC(... token ...)
Do a bunch of writes each time token arrives at the CompletionQueue.
If the cancellation_tag arrives at the CompletionQueue, then we stop the streaming.

Client Side

Create a grpc::ClientContext instance
Initiate the RPC - Stub::PrepareAsync<>
Call reader->Read as many times as we wish to receive data from the server.
At some point, call grpc::ClientContext::TryCancel();
We call reader->Finish which returns a CANCELLED Status.
destroy the grpc::ClientContext instance and the reader.

However, the cancellation_tag never reach the server. It is only when I destroy the Stub instance on the client side that I finally receive the cancellation_tag on the server's CompletionQueue. If I keep the stub alive, the server just keeps streaming data forever as if there is a client reading it.

After investigating this further, it seems the problem does not occur when both client and server run on the same process, nor when I implement a simple synchronous server. In these cases, cancellation works as expected.

So what could possibly go wrong? Could there be something wrong with how the asynchronous server is handling cancellation?

Step 2 on the server says ClientContext. Do you mean ServerContext? — Jeff Garrett, Oct 05 '22 at 13:14
Yes, just fixed it. And I think I found the issue, will post an answer. but I I do hope someone could explain this behavior. — Elad Maimoni, Oct 05 '22 at 13:27

score 0 · Accepted Answer · answered Oct 05 '22 at 13:45

After investigating this further, I think I found the issue. It seems to be some undocumented aspects regarding the behavior of a CompletionQueue.

I was using a single thread for the whole server program. So the completion handlers are invoked on the same thread that calls a AsyncNext. Like so:

while(server_active)
{
   status = _completionQueue->AsyncNext(&tag, &ok);
   if (status == grpc::CompletionQueue::GOT_EVENT)
   {
       CallHandler(tag, ok); // does the logic, probably another write
   }
}

Whenever a write would complete, it would immediately trigger another write.

It seemed that when the client triggered a cancellation, the relevant completion tag was in fact inserted to the queue, but the the draining loop never reached it, since it kept adding more write completions. It is as if the queue behaves in a last-in-first-out manner.

When I modified the loop to first drain the queue, and then invoke the handlers, I immediately got the expected behavior.

while(server_active)
{
   std::vector<Completions> completions;
   while(1) // drain the queue
   {   
       status = _completionQueue->AsyncNext(&tag, &ok);
       if (status == grpc::CompletionQueue::GOT_EVENT)
       {
          completions.emplace_back(tag, ok);
       }
       else
       {
          break;
       }
   } 

   for (auto completion : completions)
   {
       CallHandler(completion.tag, completion.ok); // does the logic
   }

}

Why gRPC client side cancellation only works when both client and server run on the same process?

1 Answers1