0

I have a very large (950GB) binary file in which I store 1billion of sequences of float points.

A small example of the type of file I have with sequences of length 3 could be:

-3.456 -2.981 1.244
2.453 1.234 0.11
3.45 13.452 1.245
-0.234 -1.983 -2.453

Now, I want to read a particular sequence (let's say the sequence with index=2, therefore the 3rd sequence in my file) so I use the following code:

#include <iostream>
#include <fstream>
#include <stdlib.h>

using namespace std;

int main (int argc, char** argv){

  if(argc < 4){
    cout << "usage: " << argv[0] << " <input_file> <length> <ts_index>" << endl;
    exit(EXIT_FAILURE);
  }

  ifstream in (argv[1], ios::binary);
  int length = atoi(argv[2]);
  int index = atoi(argv[3]);

  float* ts = new float [length];

  in.clear();
  **in.seekg(index*length*sizeof(float), in.beg);**
  if(in.bad())
    cout << "Errore\n";
  **// for(int i=0; i<index+1; i++){**                                                                                                                
  in.read(reinterpret_cast<char*> (ts), sizeof(float)*length);
  **// }**                                                                                                                                            
  for(int i=0; i<length; i++){
    cout << ts[i] << " ";
  }

  cout << endl;
  in.close();
  delete [] ts;
  return 0;
}

The problem is that when I use seekg this read fails for some indexes and I get a wrong result. If I read the file in a sequential manner (without using seekg) and print out the wanted sequence instead, I always get the correct result.

At the beginning I thought about an overflow in seekg (since the number of bytes can be very big), but I saw seekg takes in input a streamoff type which is huge (billions of billions).

David
  • 103
  • 10
  • The major problem is that you use `int` for the offsets, while [`seekg`](http://en.cppreference.com/w/cpp/io/basic_istream/seekg) expects `off_type` or `pos_type` which are most likely *not* aliases of `int` (but most likely of `std::size_t` which probably is a 64-bit integer type). On all major modern platforms, even on 64-bit platforms, `int` is still a 32-bit type, which is not adequate for such large numbers. – Some programmer dude Nov 25 '14 at 12:55
  • Are your integers 32 bit? Perhaps the integer math `index*length*sizeof(float)` is overflowing. – drescherjm Nov 25 '14 at 12:55
  • your comments are both right. adding a (streamoff) cast the problem is resolved. I thought the cast was automatic since seekg takes a streamoff type – David Nov 25 '14 at 13:30
  • The cast is automatic but it happens after the integer multiplication and thus after the overflow. – uesp Nov 25 '14 at 15:24

1 Answers1

1

Changing the line

in.seekg(index*length*sizeof(float), in.beg);

into

in.seekg((streamoff)index*length*sizeof(float), in.beg);

solved the problem.

David
  • 103
  • 10