0

I have a solution where I need to be able to log multiple JSON Objects to a file. Essentially doing one log file per day. What is the easiest way to write (and later read) these from a single file?

How does MongoDB handle this with BSON? What does it use as a separator between "records"?

Does Protocol Buffers, BSON, MessagePack, etc... offer compression and the record concept? Compression would be a nice benefit.

BuddyJoe
  • 69,735
  • 114
  • 291
  • 466

1 Answers1

0

With protocol buffers you could define the message as follows:

Message JSONObject {
    required string JSON = 1;
}

Message DailyJSONLog {
  repeated JSONObject JSON = 1;
}

This way you would just read the file from memory and deserialize it. Its essentially the same way for serializing them as well. Once you have the file (serialized DailyJSONLog) on disk, you can easily just append serialized JSONObjects to the end of that file (since the DailyJSONLog message is very simply a repeated field).

The only issue with this is if you have a LOT of messages each day or if you want to start at a certain location during the day (you're not able to easily get to the middle (or arbitrary) of the repeated list).

I've gotten around this by taking a JSONObject, serializing it and then base64 encoding it. I'd store these to a file separating by a new line. This allows you to very easily see how many records are in each file, gain access to any arbitrary JSON object within the file and to trivially keep expanding the file (you can expand the above 'repeated' message as well pretty trivially but it is a one way easy operation...)

Compression is a different topic. Protocol Buffers will not compress strings. If you were to define a pb message to match your JSON message, then you will get the benefit of having pb possibly 'compress' any integers into their [varint][1] encoded format. You will get 'less' compression if you try above base64 encoding route as well.

g19fanatic
  • 10,567
  • 6
  • 33
  • 63