1

I have an large array in memory. I am writing this in file using:

             FILE* fp = fopen("filename", "wb");
             fwrite(array, sizeof(uint32_t), 1500000000 , fp); // array saved
             fflush(fp) ;
             fclose(fp);

and reading it again using:

              FILE* fp = fopen("filename", "rb");
              fread(array, sizeof(uint32_t), 1500000000 , fp);
              fclose(fp);

For, writing it takes 7 sec and for reading it takes 5 sec.

Actually, I have not to write whole array. I have to write and read it by checking some conditions. Like (example case):

#include<iostream>
#include <stdint.h>
#include <cstdio>
#include <cstdlib>
#include <sstream>

using namespace std;

main()
{
      uint32_t* ele = new uint32_t [100] ;
      for(int i = 0; i < 100 ; i++ )
      ele[i] = i ;

      for(int i = 0; i < 100 ; i++ ){
          if(ele[i] < 20)
          continue ;
          else
          // write  ele[i] to file
          ;   
      }

 for(int i = 0; i < 100 ; i++ ){
          if(ele[i] < 20)
          continue ;
          else
          // read  number from file
          // ele[i] = number * 10 ;
          ;   
      }

     std::cin.get();
}

For this reason what I am doing is:

writing using:

for(int i = 0; i < 1500000000 ; i++ ){
if (arrays[i] < 10000000)
continue ;
uint32_t number = arrays[i] ;
fwrite(&number, sizeof(uint32_t), 1, fp1);
}

And reading using: fread(&number, sizeof(uint32_t), 1, fp1);

This case: writing takes 2.13 min and for reading it takes 1.05 min.

Which is quite long time for me. Can anybody help me, why is this happening (in second case file size is less than first one) ? And How to solve this issue ? Any other better approach ?

sashoalm
  • 75,001
  • 122
  • 434
  • 781
alessandro
  • 1,681
  • 10
  • 33
  • 54
  • probably copy paste problem - you should open the file with `rb` when reading it. – Ivaylo Strandjev Jan 24 '13 at 10:23
  • 1
    Can you produce a minimal but self-contained test case that demonstrates the problem. That would enable us to experiment with your exact code rather than second-guessing. – NPE Jan 24 '13 at 10:24
  • @IvayloStrandjev, sorry. I opened it in rb mode. But, in above code, I just make mistake. – alessandro Jan 24 '13 at 10:25
  • Have you tried using an intermediate array to store all the valid values and then writing that array to file in a single operation? – Gorpik Jan 24 '13 at 10:27
  • @Gorpik, that is impossible for me. I have very limited memory. – alessandro Jan 24 '13 at 10:28
  • Then I don't see how can you improve on that, sorry. Such a large number of file I/O operations slow things a lot. – Gorpik Jan 24 '13 at 10:30
  • @alessandro - you mentioned in another comment that your main concern is reading. Can you describe your access pattern for this - you say you read a single `uint32_t`, but didn't mention whether you read them all, sequentially, or just some of them ... Also, there may be faster I/O primitives available if you tell us what platform you're using. – Useless Jan 24 '13 at 10:34
  • @Useless, sorry, I forgot. I am using Linux. And Access pattern is Sequential all the time and I have to read whole file sequentially (not part of it). – alessandro Jan 24 '13 at 10:38
  • @Useless, but issue is I can't take whole file in an array because I have limited memory. – alessandro Jan 24 '13 at 10:40
  • Linux on regular x86 PC architecture, or some embedded platform, or ...? Also, how much memory _is_ available? – Useless Jan 24 '13 at 10:59
  • @Useless, Linux on regular x86 PC architecture. 32 GB is available. But, I have already used approx 20 GB. – alessandro Jan 24 '13 at 11:10
  • 1
    @alessandro, you'll need to describe your scenario a bit better. Do you need to read the entire file, linearly? Do you need to keep it all in-memory, or can you do read-process-write? Why is the conditional reads/writes done? - it's hard to give much meaningful advice from your original question. – snemarch Jan 24 '13 at 11:20

4 Answers4

2

I benchmarked this a little while ago, and on my box lots of small fwrite() calls can only sustain about 90 MB/s (the disk is much faster than this so the test was not disk-bound).

My suggestion would be to do your own buffering: write the values into an intermediate array, and from time to time write out the entire array using a single fwrite().

Community
  • 1
  • 1
NPE
  • 486,780
  • 108
  • 951
  • 1,012
1

Writing just once will be way faster. I would suggest you construct an auxillary array with just the elements you want to print and the write this array in a single fwrite call. Of course this will take additional memory, but that's the standard tradeoff - memory for performance.

Ivaylo Strandjev
  • 69,226
  • 18
  • 123
  • 176
1

Even though C's FILE* routines are buffered, there's still a fair amount of overhead to each call - ending up doing millions of integer-sized reads/writes will kill your performance.

EDIT: are you doing integer-sized reads as an attempt at speed optimizing? Or are you doing it for some data consistency reasons (i.e., an integer in the array must only be updated if condition is true)?

If it's for consistency reasons, consider reading a chunk (probably 4k or larger) at a time, then do the compare-and-possibly-update from the chunk of data - or use memory mapped files, if it's available on your target platform(s).

snemarch
  • 4,958
  • 26
  • 38
  • Any possible method for writing and reading it back fast. Main concern is reading. – alessandro Jan 24 '13 at 10:29
  • Read larger chunks of data at a time - or, if you have very random I/O patterns and don't need to be super portable, consider memory-mapped I/O. – snemarch Jan 24 '13 at 10:56
  • Can you kindly give a simple example of "memory-mapped I/O" for the above case ? I can't find any by googling it. – alessandro Jan 24 '13 at 11:12
  • for Windows, search for CreateFileMapping - for *u*x you'll want posix mmap. – snemarch Jan 24 '13 at 11:16
0

The title of the question says C++, so why not use the excellent buffered stream facilities? Does C++ ofstream file writing use a buffer?

Community
  • 1
  • 1
Szocske
  • 7,466
  • 2
  • 20
  • 24
  • C-style FILE* has buffering - the problem is (likely) the CPU overhead of doing millions of integer-sized calls. – snemarch Jan 24 '13 at 11:17
  • Thus: application level buffering, like those in the C++ stream libraries :-) Or any other way. – Szocske Jan 27 '13 at 14:31