0

I need to serialize lots of objects to a file (multiple GBs). We have chosen to use Google's protocol buffers for other things in this project, so I thought I would use that to serialize the objects I receive from the wire. This seems to work:

File.open(file_name, 'ab') do |f|
  some_objects.each { |some_object|
    some_object.serialize(f)
  }
end

The deserializtion is what is giving me issues. I have seen others do one object like this:

File.open(file_name, 'r') do |f|
  no = some_object.parse(f)
end

But that only does one. I tried doing this:

File.open(file_name, 'r').each do |f|
  no = some_object.parse(f)
end

But that raised this exception:

Uncaught exception: undefined method `<<' for false:FalseClass

I need to get all of them and lazily evaluate them. Any thoughts? Please feel free to give any advice on performace of this code since I'll be doing GBs of info. Thanks for your time.

By the way, I know I need to upgrade my ruby version, but since this is an internal thing I haven't been able to get time from the boss to upgrade it.

I am using ruby-protocol-buffers

user197674
  • 748
  • 2
  • 7
  • 22

1 Answers1

1

Encoded protobufs are not self-delimiting, therefore if you write multiple to a stream and then try to parse them, the entire stream will be parsed as a single message, with latter field values overwriting earlier ones. You will need to prefix each message with its size, then make sure only to read that many bytes on the receiving end.

https://developers.google.com/protocol-buffers/docs/techniques#streaming

Unfortunately I don't know Ruby so I can't give you code samples. It looks like the class LimitedIO in the Ruby protobuf library you linked might be useful for parsing messages without going past a certain length.

Kenton Varda
  • 41,353
  • 8
  • 121
  • 105
  • Thanks. After getting more clarification, I found out we were using another protocol buffer for a certain number of objects. That got me closer. – user197674 Feb 20 '14 at 06:18