0

I am new to protobuf and I have a question about how to generate a really big protobuf file. Use the Google tutorial as an example:

message Person {
  required string name = 1;
  required int32 id = 2;
  optional string email = 3;

  enum PhoneType {
    MOBILE = 0;
    HOME = 1;
    WORK = 2;
  }

  message PhoneNumber {
    required string number = 1;
    optional PhoneType type = 2 [default = HOME];
  }

  repeated PhoneNumber phone = 4;
}

message AddressBook {
  repeated Person person = 1;
}   

I have to do something similarly: I need to generate a lot of(about 200 million) messages in one file. And If I try using

message AddressBook {
      repeated Person person = 1;
    }

Then the memory would obviously run out quickly if using AddressBook.writeTo() method. Any suggestions on how to handle this case? Thanks

user2403909
  • 85
  • 3
  • 12

1 Answers1

0

This is discussed in the Protobuf docs:

https://developers.google.com/protocol-buffers/docs/techniques#large-data

Large Data Sets

Protocol Buffers are not designed to handle large messages. As a general rule of thumb, if you are dealing in messages larger than a megabyte each, it may be time to consider an alternate strategy.

That said, Protocol Buffers are great for handling individual messages within a large data set. Usually, large data sets are really just a collection of small pieces, where each small piece may be a structured piece of data. Even though Protocol Buffers cannot handle the entire set at once, using Protocol Buffers to encode each piece greatly simplifies your problem: now all you need is to handle a set of byte strings rather than a set of structures.

Protocol Buffers do not include any built-in support for large data sets because different situations call for different solutions. Sometimes a simple list of records will do while other times you may want something more like a database. Each solution should be developed as a separate library, so that only those who need it need to pay the costs.

Kenton Varda
  • 41,353
  • 8
  • 121
  • 105
  • but this link said it's ok. https://stackoverflow.com/questions/47564437/why-protobuf-is-bad-for-large-data-structures – daohu527 Aug 06 '22 at 14:03
  • @daohu527 if you look closely at that answer, it says that you need to split the dataset up into many small messages, which is the same thing being said here. – Kenton Varda Aug 08 '22 at 21:32