Python GRPC Server performance bottleneck

Question

I have written a grpc server that contains multiple rpc services. Some are unary and some are server side streaming.

It connects to a grpc kubernetes server so I am using the python kubernetes client to query the server

Currently I am having some performance problems as I think if there are multiple request coming in that it buffers for every worker to finish up before it can serve the incoming request.

def startServer():
    global server
    server = grpc.server(futures.ThreadPoolExecutor(max_workers=10))
    servicer_grpc.add_Servicer_to_server(Servicer(), server)
    server.add_insecure_port('[::]:' + str(port))
    server.start()

My questions are:

How can I improve my performance? Will adding more max_workers in the threadpoolexecutor helps?
How can I diagnose the problem and isolate which is causing the slowdown?
I am thinking if the size of the response matters in this case as I am streaming bytestring to the client. Is there a way to measure the size of the response or does it matter in python grpc?

I would like to know how do you diagnose your python grpc server so that you would know where to improve?

score 3 · Accepted Answer · answered Feb 12 '20 at 18:18

The performance issue you described sounds like a concurrency issue. The gRPC Python server uses the ThreadExecutor to handle the RPCs, and increase the number of workers should be able to allow more concurrent RPC.

grpc.server(futures.ThreadPoolExecutor(max_workers=1000))

For question 2, profilers like cProfile, yep, and perf are powerful tools to debug performance issues.

For question 3, the size of the response doesn't matter that much (KB-level).

On the other hand, we are working on an AsyncIO version of gRPC Python. It has significant performance boost, and solves the limited concurrent RPC issue. It is experimental for now, but feel free to try it out.

from grpc.experimental import aio

class Servicer(...):
    async def ServerStreamingMethodHandler(...):
        for ...:
            yield response

async def startServer():
    global server
    server = aio.server()
    servicer_grpc.add_Servicer_to_server(Servicer(), server)
    server.add_insecure_port('[::]:' + str(port))
    await server.start()

Thanks for the response. Just would like to ask, what is the proper way of running cProfile in a grpc server? Assuming I wanted to profile each rpc service or should I just profile the class with the __main__ as the entry point. Also, how do I measure the size of the response stream to determine if it is KB level or not. — Mark Estrada, Feb 13 '20 at 03:03
Way to run cProfile: gRPC server is a multithreading application, which means you need to instrument each single thread. Alternatively, you can use an open source version that automatically does that for you: https://github.com/sumerc/yappi Response stream: `len(response.SerializeToString())` — Lidi Zheng, Feb 14 '20 at 18:21

Python GRPC Server performance bottleneck

1 Answers1