According to this documentation, I would say you need to try two things:
- working with asyncio API (if that's not already done) by doing something like:
async def run(stub: QueryStub) -> None:
async for object in stub.ResponseMethod(empty_pb2.Empty()):
print(object.attribute_i_need)
note that the Empty() is just because I do not know your API definition.
- second would be to try the experimental feature
SingleThreadedUnaryStream
(if applicable to your case) by doing:
with grpc.insecure_channel(target='localhost:50051', options=[(grpc.experimental.ChannelOptions.SingleThreadedUnaryStream, 1)]) as channel:
What I tried
I don't really know if it covers your use case (you can give me more info on that and I'll update), but here is what I tried:
I have a schema like:
service TestService {
rpc AMethod(google.protobuf.Empty) returns (stream Test) {} // stream is optional, I tried with both
}
message Test {
repeated string message = 1;
repeated string message2 = 2;
repeated string message3 = 3;
repeated string message4 = 4;
repeated string message5 = 5;
repeated string message6 = 6;
repeated string message7 = 7;
repeated string message8 = 8;
repeated string message9 = 9;
repeated string message10 = 10;
repeated string message11 = 11;
}
on the server side (with asyncio) I have
async def AMethod(self, request: empty_pb2.Empty, unused_context) -> AsyncIterable[Test]:
test = Test()
for i in range(10):
test.message.append(randStr())
# repeat append for every other field or not
for i in range(1000000):
yield test
where randStr
creates a random string of length 10000 (totally arbitrary).
and on the client side (with SingleThreadedUnaryStream
and asyncio)
async def run(stub: TesterStub) -> None:
tests = stub.AMethod(empty_pb2.Empty())
async for test in tests:
print(test.message)
Benchmark
Note: This might vary depending on your machine
For the example with only one repeated field
filled, I get an average (ran it 3 times) of 77 sec
.
And for all the fields being filled, it is really long so I tried providing smaller strings (10 in length) and it still takes too long. I think the mix of repeated
and stream
is not a good idea. I also tried without stream
and I get an average (run 3 times) of 45 sec
.
My conclusion
This is really slow if all the repeated fields all filled with data and this is ok-ish when only one is filled. But overall I think asyncio helps.
Furthermore, this documentation explains that Protocol Buffers are not designed to handle large messages
, however Protocol Buffers are great for handling individual messages within a large data set
.
I would suggest that, if I got your schema right, you rethink the API design because that seems to be not optimal.
but once again I might have not understand the schema properly.