I posted a related but still different question regarding Protobuf-Net before, so here goes:
I wonder whether someone (esp. Marc) could comment on which of the following would most likely be faster:
(a) I currently store serialized built-in datatypes in a binary file. Specifically, a long(8 bytes), and 2 floats (2x 4 bytes). Each 3 of those later make up one object in deserialized state. The long type represents DateTimeTicks for lookup purposes. I use a binary search to find the start and end locations of a data request. A method then downloads the data in one chunk (from start to end location) knowing that each chunk consists of a packet of many of above described triplets(1 long, 1 float, 1 float) and each triplet is always 16 bytes long. Thus the number triples retrieved is always (endLocation - startLocation)/16. I then iterate over the retrieved byte array, deserialize (using BitConverter) each built-in type and then instantiate a new object made up of a triplet each and store the objects in a list for further processing.
(b) Would it be faster to do the following? Build a separate file (or implement a header) that functions as index for lookup purposes. Then I would not store individual binary versions of the built-in types but instead use Protbuf-net to serialize a List of above described objects (= triplet of int, float, float as source of object). Each List would contain exactly and always one day's worth of data (remember, the long represents DateTimeTick). Obviously each List would vary in size and thus my idea of generating another file or header for index lookup purposes because each data read request would only request a multiple of full days. When I want to retrieve the serialized list of one day I would then simply lookup the index, read the byte array, deserialize using Protobuf-Net and already have my List of objects. I guess why I am asking is because I do not fully understand how deserialization of collections in protobuf-net works.
To give a better idea about the magnitude of the data, each binary file is about 3gb large, thus contains many millions of serialized objects. Each file contains about 1000 days worth of data. Each data request may request any number of day's worth of data.
What in your opinion is faster in raw processing time? I wanted to garner some input before potentially writing a lot of code to implement (b), I currently have (a) and am able to process about 1.5 million objects per second on my machine (process = from data request to returned List of deserialized objects).
Summary: I am asking whether binary data can be faster read I/O and deserialized using approach (a) or (b).