I'd like to understand if there's a mechanism to control batch sizes being sent from server to client.
I've implemented the python server from the Github repo and a basic F# client.
As a test, I've added a flight containing 1 million rows which I'd like to send back to the client. At first, the client fails with the following GRPC exception.
One or more errors occurred. (Status(StatusCode="ResourceExhausted", Detail="Received message exceeds the maximum configured message size."))
As suggested, the message size has been exceeded. As a fix, I can set the maximum allowed grpc message size to be unlimited i.e.
let ops = new GrpcChannelOptions()
ops.MaxReceiveMessageSize <- Nullable()
let downloadChannel = GrpcChannel.ForAddress(uri, ops)
let downloadClient = new FlightClient(download_channel)
However, I'd like to understand if there's a way to set the batch size being sent to the client from the server i.e. in the do_get method of the server
def do_get(self, context, ticket):
key = ast.literal_eval(ticket.ticket.decode())
if key not in self.flights:
return None
return pyarrow.flight.RecordBatchStream(self.flights[key])
I'd like to set the batch size when creating pyarrow.flight.RecordBatchStream. Looking at the documentation, the options specified using pyarrow.ipc.IpcWriteOptions doesn't allow the batch size to be set?
Thanks in advance for any help :)
UPDATE - see the accepted answer below which led me down the correct path. I've updated my code as follows to fix the issue.
def do_get(self, context, ticket):
key = ast.literal_eval(ticket.ticket.decode())
if key not in self.flights:
return None
reader = pyarrow.ipc.RecordBatchReader().from_batches(self.flights[key].schema, pyarrow.Table.to_batches(self.flights[key]))
return pyarrow.flight.RecordBatchStream(reader)