4

I have a dataframe with about 1.5 million rows. I want to convert this to a protobuf.

Naive method

# generated with protoc
import my_proto

pb = my_proto.Table()
for _, row in big_table.iterrows():
    e = pb.rows.add()
    e.similarity = row["similarity"]
    e.id = row["id"]

The throughput is about 100 rows per second. The total running time is about a couple of hours.

Is there a way to do this in a non-incremental fashion?

user82395214
  • 829
  • 14
  • 37
  • What's the context of your question? I can't tell if your question is pandas centric or protoc centric. Are you looking for a single pandas operations to transform your table? – willwrighteng Dec 03 '20 at 03:14
  • @will.cass.wrig I can convert the data frame to something else like a dict or list, that part doesn’t matter as much. What matters is doing batch operations when writing profobuf data. – user82395214 Dec 03 '20 at 03:17
  • 1
    Sorry I'm not versed in protocol buffers but it seems like they can be implemented asynchronously ([link](https://stackoverflow.com/questions/38387443/how-to-implement-a-async-grpc-python-server)). I would try tagging your post with `grpc`, there's a larger protobuf community under that tag vs `protoc` – willwrighteng Dec 03 '20 at 03:35

0 Answers0