6

I am attempting to port a small data analysis program from a 64 bit UNIX to a 32 bit Windows XP system (don't ask :)). But now I am having problems with the 2GB file size limit (long not being 64 bit on this platform).

I have searched this website and others for possible sollutions but cannot find any that are directly translatable to my problem. The problem is in the use of fseek and ftell.

Does anyone know of a modification to the following two functions to make them work on 32 bit Windows XP for files larger than 2GB (actually order 100GB).

It is vital that the return type of nsamples is a 64 bit integer (possibly int64_t).

long nsamples(char* filename)
{
  FILE *fp;
  long n;

  /* Open file */
  fp = fopen(filename, "rb");

  /* Find end of file */
  fseek(fp, 0L, SEEK_END);

  /* Get number of samples */
  n = ftell(fp) / sizeof(short);

  /* Close file */
  fclose(fp);

  /* Return number of samples in file */
  return n;
}

and

void readdata(char* filename, short* data, long start, int n)
{
  FILE *fp;

  /* Open file */
  fp = fopen(filename, "rb");

  /* Skip to correct position */
  fseek(fp, start * sizeof(short), SEEK_SET);

  /* Read data */
  fread(data, sizeof(short), n, fp);

  /* Close file */
  fclose(fp);
}

I tried using _fseeki64 and _ftelli64 using the following to replace nsamples:

__int64 nsamples(char* filename)
{
  FILE *fp;
  __int64 n;
  int result;

  /* Open file */
  fp = fopen(filename, "rb");
  if (fp == NULL)
  {
    perror("Error: could not open file!\n");
    return -1;
  }

  /* Find end of file */
  result = _fseeki64(fp, (__int64)0, SEEK_END);
  if (result)
  {
    perror("Error: fseek failed!\n");
    return result;
  }

  /* Get number of samples */
  n = _ftelli64(fp) / sizeof(short);

  printf("%I64d\n", n);

  /* Close file */
  fclose(fp);

  /* Return number of samples in file */
  return n;
}

for a file of 4815060992 bytes I get 260046848 samples (e.g. _ftelli64 gives 520093696 bytes) which is strange.

Curiously when I leave out the (__int64) cast in the call to _fseeki64 I get a runtime error (invalid argument).

Any ideas?

Pim Schellart
  • 715
  • 1
  • 6
  • 18
  • What compiler are you using? gcc? Visual (something)? Something else? – Eric Towers Oct 26 '10 at 00:01
  • I am using MinGW ("cannot" use VS since the functions I am writing are part of a f2py Python extension module). The Win32 API might be an option if it could be easily integrated into this function without adding to many dependencies (as you can probably tell I am not that familiar with Windows :)) – Pim Schellart Oct 27 '10 at 14:19
  • I have posted a more specific question as well, if that gets answered I'll add the final solution here too. – Pim Schellart Oct 27 '10 at 14:20

4 Answers4

4

sorry for not posting sooner but I have been preoccupied with other projects for a while. The following solution works:

__int64 nsamples(char* filename)
{
  int fh;
  __int64 n;

  /* Open file */
  fh = _open( filename, _O_BINARY );

  /* Find end of file */
  n = _lseeki64(fh, 0, SEEK_END);

  /* Close file */
  _close(fh);

 return n / sizeof(short);
}

The trick was using _open instead of fopen to open the file. I still don't understand exactly why this has to be done, but at least this works now. Thanks to everyone for your suggestions which eventually pointed me in the right direction.

Pim Schellart
  • 715
  • 1
  • 6
  • 18
3

There are two functions called _fseeki64 and _ftelli64 that support longer file offsets even on 32 bit Windows:

int _fseeki64(FILE *stream, __int64 offset, int origin);

__int64 _ftelli64(FILE *stream);
Codo
  • 75,595
  • 17
  • 168
  • 206
  • I tried this but it doesn't seem to return the right values (see post) – Pim Schellart Oct 25 '10 at 12:33
  • What compiler do you use? VisualStudio? And what version? – Codo Oct 25 '10 at 14:47
  • I use the latest version of MinGW (basically GCC 4.5). As the package I am compiling is a Python extension with f2py and I have no clue how to compile that with VisualStudio. – Pim Schellart Oct 25 '10 at 16:57
  • I've never used MinGW so I'm afraid I can't really help. But have you looked at _ftello_ and _fseeko_ which are 64 bit versions of _ftell_ and _fseek_ available in Unix like libraries? – Codo Oct 25 '10 at 17:17
  • I have looked as these (_fseeko_ _ftello) functions but am unsure if they also work on Windows (on my 64 bit UNIX machine there is no problem since I can just use fseek and ftell with the 64 bit long, it really is a Windows specific issue). – Pim Schellart Oct 27 '10 at 14:23
1

And for gcc, see SO question 1035657. Where the advice is compile with the flag -D_FILE_OFFSET_BITS=64 so that the hidden variable(s) (of type off_t) used by the f-move-around functions is(are) 64-bits.

For MinGW: "Large-file support (LFS) has been implemented by redefining the stat and seek functions and types to their 64-bits equivalents. For fseek and ftell, separate LFS versions, fseeko and ftello, based on fsetpos and fgetpos, are provided in LibGw32C." (reference). In recent versions of gcc, fseeko and ftello are built-in and a separate library is not needed.

Community
  • 1
  • 1
Eric Towers
  • 4,175
  • 1
  • 15
  • 17
  • @Pim Schellart: I can neither confirm nor deny that. My two currently working gcc setups are Linux/POSIX (within the context of your question). Testing on those, I see LFS behaviour from fseek() and ftell(). So I am unable to test with gcc in a non-POSIX environment. – Eric Towers Oct 26 '10 at 00:45
  • fseeko and ftello are provided with the latest MinGW gcc 4.8.2; you don't need LibGw32C. – JPaget Feb 06 '14 at 01:47
1

My BC says:

520093696 + 4294967296 => 4815060992

I'm guessing that your print routine is 32-bit. Your offset returned is most likely correct but being chopped off somewhere.

dascandy
  • 7,184
  • 1
  • 29
  • 50