1

I have a bit of code in C++ that writes structs to a file. The format of the struct is:

     struct dataHeader
     {
      int headerID;
      int numberOfDataLines;
      };

     struct data
     {
       double id;
       double type;
       char[100] name;
       char[100] yyy;
      };

Now, these two structs are always written in pairs and a file contains upwards of 50000 of these pairs.

My question is is there a way to do this more efficiently? The file size is a major concern to me.

EDIT: The current code is a simple fwrite in a loop (Psuedo-code) :

   while(dataBlock.Next())
   {
          fwrite(&_dataHeader, sizeof(dataHeader), 1, fpbinary); 

          while( dataLine.Next())
          {
            fwrite(&_data[i], sizeof(data), 1, fpbinary); 
          }  
   }

Thanks.

user2822838
  • 337
  • 1
  • 3
  • 13
  • 1
    Can you show the "inefficient" code you're already using? It's hard to suggest a more efficient way without knowing what you're already doing... – Wooble Feb 13 '14 at 13:17
  • @user2822838, using compression should be simpler than having to reorganize your code. Since you use C++, you might want to use a C++ compression library instead of zlib directly. – Agnel Kurian Feb 13 '14 at 13:25
  • If I use a compression library, wouldn't there be an overhead when reading and updating the binary file? – user2822838 Feb 13 '14 at 13:33
  • @user2822838, the overhead of compression would surely be less than the overhead of reading and writing more data to the disk!! – Shahbaz Feb 13 '14 at 13:45
  • @Shahbaz: At the moment, I am able to read a particular block of data by just skipping the file pointer to it. Presumably, this wouldn't be possible when using the compressed file and additionally would require the entire file to be decompressed? – user2822838 Feb 13 '14 at 14:04
  • That largely depends on the compression algorithm. A quick look at the [zlib's manual](http://zlib.net/manual.html), the `gzseek` function looks like what you would need to use. Obviously, the seek has some extra computation, but all in all, even if you can compress a large file to only 90%, the amount of disk access reduced may indeed save more time than the compression-related computation consumes. – Shahbaz Feb 13 '14 at 14:14

2 Answers2

2

You may reduce you data storage requirements by grouping your data if they have similarities. For example you may prepare a list of "name" or "yyy" values and write your data in groups such that first the data values with names "Bob" and then "Josh".

If all of your data are unique, the only option left for you is to compress your binary data before writing into file and decompress it after you read it. I suggest you to use QuickLZ which is pretty fast for compression and decompression.

Semih Ozmen
  • 571
  • 5
  • 20
1

You can try to compress the content of the file if the time requirements are not very high.

How can I easily compress and decompress files using zlib?

Community
  • 1
  • 1
Jose Palma
  • 756
  • 6
  • 13