Long Lived GRPC Calls

Question

I am wondering the best practice for long lived GRPC calls.

I have a typical Client --> Server call (both golang) and the server processing can take up to about 20-30 seconds to complete. I need the client to wait until it is completed before I move on. Options that I see (and I don't love any of them):

Set timeout to absurd length (e.g. 1 min) and just wait. This feels like a hack and also I expect to run into strange behavior in my service mesh with things like this going on.
Use a stream - I still need to do option #1 here and it really doen't help me much as my response is really just Unary and a stream doesn't do me much good
Polling - (i implemented this and it works but I don't love it) - I do most of the processing async and have my original GRPC call return a transactionID that is stored in Redis and holds the state of the transaction. I created a different GRPC endpoint to poll the status of the transaction in a loop.
Queue or Stream (e.g. Kafka Stream) - setup the client to be a listener into something like a Kafka topic and have my server notify the (Queue || Stream) when it is done so that my client would pick it up. I thought this would work but seemed way over-engineered.

Option #3 is working for me but sure feels pretty dirty. I am also 100% dependent on Redis. Given that GRPC is built on HTTP2 then I would think that maybe there is some sort of Server Push option but I am not finding any.

I fear that I am overlooking a simple way to handle this problem.

Thanks

Have you looked into making your API a server-side streamed RPC? — kingkupps, Apr 18 '20 at 01:34
Thank you. Maybe I am missing something about your question but #2 option was to use a stream. I don't think that will help as the response is still unary. I need to wait until it is fully completed before proceeding. This means that my connection timeout will need to be very long. — mornindew, Apr 18 '20 at 02:52
Ahh sorry I misread that. I think 4 is probably your best option especially if you can reuse the result of some of the requests. — kingkupps, Apr 18 '20 at 04:42
What's the problem with an RPC taking 20-30 seconds? Why is a 1min timeout absurd? — Paul Hankin, Apr 18 '20 at 09:25
Kafka is a bit over the top, imho, since you most likely do not want to look back in time. A simple message queue, for example [NATS](https://nats.io) or (if you want SaaS) SQS or [CloudAMQP](https://www.cloudamqp.com) should satisfy your requirements with a lot less overhead. Do not get me wrong: I love Kafka, and if you already use it in your project, you *might* be off well enough with it. But as a message queue, imho it is limited to some very specific use cases. — Markus W Mahlberg, Apr 18 '20 at 10:33
@Paul - I could be impacted by my experience with HTTP 1.1 but seemed wrong to have a call open for so long. I get that it is using a single bi directional over HTTP2 channel but all the examples that I saw of timeouts were measured in millisecs. Setting timeouts to minutes doesn't appear to be supported in the examples that I found. Maybe I should think differently about HTTP2? — mornindew, Apr 18 '20 at 14:37

score 3 · Answer 1 · answered Aug 11 '20 at 23:30

3

Long-lived gRPC channel is an important use case and fully supported. However, one gRPC channel may have more than one TCP connection, and TCP can get disconnected due to inactivity. You can use keep-alive or HTTP/2 ping to keep TCP alive. See this thread for more details. None of the options you mentioned address the issue that your server takes a while to respond. Unless there’s something I’m missing, nothing in your question is a gRPC issue.

answered Aug 11 '20 at 23:30

Abhijit Sarkar

21,927
20
110
219

I have a concern about "keep-alive" in terms of long-lived gRPC connections: If the server is deployed as a k8s pod, and it has N replicas, a long-lived connection will stick to a single pod without the possibility of handling a proper load balance strategy.... Am I correct ? – Goose May 07 '22 at 00:56
load balancing will only be used in establishing a new connection. The problem is not unique to k8s. After the pod is taken offline the connection will have to drop and re-establish. – Scot Nov 02 '22 at 20:27

score 0 · Answer 2 · answered Jun 16 '23 at 00:49

0

Bidirectional grpc: 1> client invokes server function, gets a stream 2> In one thread, client submits/sends request on the stream 3> In another thread client receives response/result. The response data has request id (or entire request) to correlate it to the request. Note, on any error, both threads stops using the stream, new stream is created and used by both the threads.

answered Jun 16 '23 at 00:49

Vikas Kumar

1

1

As it’s currently written, your answer is unclear. Please [edit] to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Jun 21 '23 at 15:36

Long Lived GRPC Calls

2 Answers2