1

I am trying to figure out whether I can use protobuf-net to store and retrieve the following serialized data structure:

I have about 200,000 objects of size 16 bytes (the object contains one long and two floating type values, 8 byte plus 2 * 4 bytes) each day that I like to store in a binary file. Retrieval of such objects will only be by full days, for example, I would want to request objects between April 1 2012 and April 6 2012 which are supposed to be read starting April 1 , then April 2,...April 6. One requirement is that access needs to be random, meaning, The file may contain data from 2010 to June 2012 but I may only want to retrieve elements between April 1 and April 6 2012 without having to read all elements from the start.

I currently store the data as contiguous byte arrays in the order of DateTimeTick but without regards to when a new day starts or ends. If I could use protbuf-net to stream the data as full-day "blobs" as IEnumerable that would be terrific. Is that possible? I was thinking to store an IEnumerable or List for each day serialized with protobuf-net but am not sure how to randomly access a particular list later on? Any ideas or suggestions? Thanks

Matt
  • 7,004
  • 11
  • 71
  • 117

1 Answers1

1

That isn't a typical use-case for protobuf-net, and while I suspect it would be possible to use it that way, it wouldn't be my instinct to try to do so. If the key requirement is to split by day, then using multiple files would be an obvious choice. Alternatively, tweak your existing file format to include (per day) the date stamp and the size of the data for that day - then you can just skip forward in entire days using the FileStream's .Position property.

protobuf-net does have streaming and skipping APIs, or a raw "reader" API if you don't want to involve the core serializer, but: I'm not sure this is going to help you massively.

Frankly, since (in your current process) each chunk is fixed size, you could also use binary search (maybe using linear interpolation for the staring position) to just seek to the right time.

Marc Gravell
  • 1,026,079
  • 266
  • 2,566
  • 2,900
  • Thanks for your comments. If I went to store each day in a separate file would you recommend to store the objects as a List or IEnumerable serialized with protobuf-net or would it be faster to simply store and retrieve all raw byte arrays for a given day? – Matt Jun 08 '12 at 06:22
  • @Freddy if you have something that works, I'd stick with it. protobuf-net is efficient, but still has to do some level of processing as the format is more structured and flexible. It would also want to add some (very terse, i.e. 1 byte per field typically) headers into the output. In the case you describe, where the data is very simple (i.e. not structured, with nested/inner data, and with predictable layout) I'm not sure that protobuf-net is going to buy you anything. Where it excels is handling more general object serialization, which can be complex, and which needs to be... – Marc Gravell Jun 08 '12 at 06:26
  • ...extensible over time so that people can "version" the data easily (i.e. at some point a new field gets added), or handling complex object trees (A has a list of B, each of which has a C and a D, and various other fields at all levels). – Marc Gravell Jun 08 '12 at 06:27