0

I'm implementing a custom load balancer for GRPC to enable client side load balancing in my app. Due to business requirements I need to use a response time based metric in my load balancing algorithm.

The problem is that GRPC Picker interface, which is used for choosing subchannels in a load balancer, doesn't have any callback that is called after request processing. So I can't measure an execution time inside of a Picker. To bypass this I try to use a client interceptor to measure exact time of RPCs, but seems like interceptors doesn't have any information about a server address chosen as a result of load balancing.

Is there any other mechanism to combine load balancing with measuring RPC response time in GRPC?

MagicMan
  • 21
  • 3

1 Answers1

0

One thing to be aware of is that the LB policy interface inside of C-core is not currently a public API. It is not a stable API and will very likely change from release to release, so if you implement your own LB policy, you will very likely need to make changes to keep it working when you upgrade gRPC. We would like to eventually provide a public, stable LB policy API, but we can't do that until after we finish the EventEngine effort, because the current (terrible) polling APIs are directly exposed to LB policies. And even once that happens, we may decide to make some structural changes to the current API before we stabilize it.

With that caveat out of the way, note that in our current implementation, LB policies can actually get a callback when the call completes. They can do this by having the picker return a recv_trailing_metadata_ready callback. The channel will invoke that callback when the call finishes. Note that the channel does not provide any synchronization for this callback; it will not be invoked in the WorkSerializer (which is used for all LB policy control plane operations) nor in the data plane mutex (held while the picker is called), so the LB policy implementation has to do its own synchronization in the callback. Also, there may currently be some edge cases where the callback doesn't get invoked when the pick result is discarded, but that's something we'll need to fix at some point anyway, so if you run into it, please file an issue about it.

I hope this info is helpful!

Mark D. Roth
  • 789
  • 4
  • 4