I know that google states that protobufs don't support large messages (i.e. greater than 1 MB), but I'm trying to stream a dataset using gRPC that's tens of megabytes, and it seems like some people say it's ok, or at least with some splitting...
However, when I try to send an array this way (repeated uint32
), it takes like 20 seconds on the same local machine.
#proto
service PAS {
// analyze single file
rpc getPhotonRecords (PhotonRecordsRequest) returns (PhotonRecordsReply) {}
}
message PhotonRecordsRequest {
string fileName = 1;
}
message PhotonRecordsReply {
repeated uint32 PhotonRecords = 1;
}
where PhotonRecordsReply
needs to be ~10 million uint32 in length...
Does anyone have an idea on how to speed this up? Or what technology would be more appropriate?
So I think I've implemented streaming based on comments and answers given, but it still takes the same amount of time:
#proto
service PAS {
// analyze single file
rpc getPhotonRecords (PhotonRecordsRequest) returns (stream PhotonRecordsReply) {}
}
class PAS_GRPC(pas_pb2_grpc.PASServicer):
def getPhotonRecords(self, request: pas_pb2.PhotonRecordsRequest, _context):
raw_data_bytes = flb_tools.read_data_bytes(request.fileName)
data = flb_tools.reshape_flb_data(raw_data_bytes)
index = 0
chunk_size = 1024
len_data = len(data)
while index < len_data:
# last chunk
if index + chunk_size > len_data:
yield pas_pb2.PhotonRecordsReply(PhotonRecords=data[index:])
# all other chunks
else:
yield pas_pb2.PhotonRecordsReply(PhotonRecords=data[index:index + chunk_size])
index += chunk_size
Min repro Github example