gRPC slow serialization on large dataset

Question

I know that google states that protobufs don't support large messages (i.e. greater than 1 MB), but I'm trying to stream a dataset using gRPC that's tens of megabytes, and it seems like some people say it's ok, or at least with some splitting...

However, when I try to send an array this way (repeated uint32), it takes like 20 seconds on the same local machine.

#proto
service PAS {
  // analyze single file
  rpc getPhotonRecords (PhotonRecordsRequest) returns (PhotonRecordsReply) {}
}

message PhotonRecordsRequest {
  string fileName = 1;
}

message PhotonRecordsReply {
  repeated uint32 PhotonRecords = 1;
}

where PhotonRecordsReply needs to be ~10 million uint32 in length...

Does anyone have an idea on how to speed this up? Or what technology would be more appropriate?

So I think I've implemented streaming based on comments and answers given, but it still takes the same amount of time:

#proto
service PAS {
  // analyze single file
  rpc getPhotonRecords (PhotonRecordsRequest) returns (stream PhotonRecordsReply) {}
}

class PAS_GRPC(pas_pb2_grpc.PASServicer):

    def getPhotonRecords(self, request: pas_pb2.PhotonRecordsRequest, _context):
        raw_data_bytes = flb_tools.read_data_bytes(request.fileName)
        data = flb_tools.reshape_flb_data(raw_data_bytes)
        index = 0
        chunk_size = 1024
        len_data = len(data)
        while index < len_data:
            # last chunk
            if index + chunk_size > len_data:
                yield pas_pb2.PhotonRecordsReply(PhotonRecords=data[index:])
            # all other chunks
            else:
                yield pas_pb2.PhotonRecordsReply(PhotonRecords=data[index:index + chunk_size])
            index += chunk_size

Min repro Github example

The `getPhotonRecords` RPC in your example is **not** a streaming method. The reference you included more precisely says that "messages larger than 1MB" are discouraged but streaming represents "large datasets" (many, smaller messages) and using this approach is suitable. — DazWilkin, Feb 04 '22 at 23:58
Sending messages comprising repeated (arrays) of `uint32` is likely "challenging". `uint32` (and other integer encodings in protos) use variable-length encoding. See: https://developers.google.com/protocol-buffers/docs/proto#scalar — DazWilkin, Feb 05 '22 at 00:03
@DazWilkin does edit seem like it should run with streaming? it's still slow... — joshp, Feb 07 '22 at 19:17

score 2 · Accepted Answer · answered Feb 07 '22 at 02:37

If you changed it over to use streams that should help. It took less than 2 seconds to transfer for me. Note this was without ssl and on localhost. This code I threw together. I did run it and it worked. Not sure what might happen if the file is not a multiple of 4 bytes for example. Also the endian order of bytes read is the default for Java.

I made my 10 meg file like this.

dd if=/dev/random  of=my_10mb_file bs=1024 count=10240

Here's the service definition. Only thing I added here was the stream to the response.

service PAS {
  // analyze single file
  rpc getPhotonRecords (PhotonRecordsRequest) returns (stream PhotonRecordsReply) {}
}

Here's the server implementation.

public class PhotonsServerImpl extends PASImplBase {

  @Override
  public void getPhotonRecords(PhotonRecordsRequest request, StreamObserver<PhotonRecordsReply> responseObserver) {
    log.info("inside getPhotonRecords");
    
    // open the file, I suggest using java.nio API for the fastest read times.
    Path file = Paths.get(request.getFileName());
    try (FileChannel fileChannel = FileChannel.open(file, StandardOpenOption.READ)) {

      int blockSize = 1024 * 4;
      ByteBuffer byteBuffer = ByteBuffer.allocate(blockSize);
      boolean done = false;
      while (!done) {
        PhotonRecordsReply.Builder response = PhotonRecordsReply.newBuilder();
        // read 1000 ints from the file.
        byteBuffer.clear();
        int read = fileChannel.read(byteBuffer);
        if (read < blockSize) {
          done = true;
        }
        // write to the response.
        byteBuffer.flip();
        for (int index = 0; index < read / 4; index++) {
          response.addPhotonRecords(byteBuffer.getInt());
        }
        // send the response
        responseObserver.onNext(response.build());
      }
    } catch (Exception e) {
      log.error("", e);
      responseObserver.onError(
          Status.INTERNAL.withDescription(e.getMessage()).asRuntimeException());
    }
    responseObserver.onCompleted();
    log.info("exit getPhotonRecords");

  }
}

The client just logs the size of the array received.

public long getPhotonRecords(ManagedChannel channel) {
  if (log.isInfoEnabled())
    log.info("Enter - getPhotonRecords ");

  PASGrpc.PASBlockingStub photonClient = PASGrpc.newBlockingStub(channel);

  PhotonRecordsRequest request = PhotonRecordsRequest.newBuilder().setFileName("/udata/jdrummond/logs/my_10mb_file").build();

  photonClient.getPhotonRecords(request).forEachRemaining(photonRecordsReply -> {
    log.info("got this many photons: {}", photonRecordsReply.getPhotonRecordsCount());
  });

  return 0;
}

I updated the question above with what I believe is a python version of your answer... but it still runs slow. Any insight on my possible error? — joshp, Feb 07 '22 at 19:20
also just ran a time comparison, and the streaming takes 5s for 1MB of data, whereas the unary takes 10s... but it's still not much of a difference, and I seem to be seeing other places online where people have ms level speed! - https://github.com/grpc/grpc-dotnet/issues/1080 — joshp, Feb 07 '22 at 22:26
Question: You are not running in the debugger correct? That can effect times. My computer might be fast than yours? Also, you did see were I used streaming to send 10 meg in small pieces of about 4K each, right? — aerobiotic, Feb 08 '22 at 13:06
I believe with Python there's no difference running with debugging or not, as it's JIT compiled... I checked and I was getting it in 1K pieces. Also, the computer I have should be quite fast, and I don't think it would account for something that's 10-100x slower than what I've seen posted on the github link above :\ — joshp, Feb 08 '22 at 17:37
@ aerobiotic I just created an min repro example and added to original post — joshp, Feb 08 '22 at 18:28
@joshp - I was transferring 4k each time. Not sure if that would make that much difference. Also, I used Java ... tho I have always been impressed with Python speed, so not sure if that would account for it. I'll try to run your code, if I can find the time. Please try 4K and perhaps experiment with that transfer size to find the sweet spot. — aerobiotic, Feb 09 '22 at 09:53
@joshp - With your code I get (1.796321153640747s 7598609873 photons in 6990848 bins) that transferring a 10meg file. This is similar to my statement of about 2 seconds for Java. So could it be something with your computer? At any rate the use of streams for larger data sets is the correct answer. — aerobiotic, Feb 09 '22 at 17:21
@joshp - With your code I get $ python3 grpc_client.py 0.35590553283691406s 18506294 photons in 1008640 bins 0.28502464294433594s 18506294 photons in 1008640 bins Then pointed it to a 10meg file (1.796321153640747s 7598609873 photons in 6990848 bins), commented out the unary call since that would not scale. At any rate the use of streams for larger data sets is the correct answer. — aerobiotic, Feb 09 '22 at 17:28
I ran it on on my windows box and got results/timing similar to yours... seems that this is caused by an M1 macbook somehow, not sure how to fix it though... — joshp, Feb 10 '22 at 17:42

gRPC slow serialization on large dataset

1 Answers1