2

I have this idiomatic snippet for getting the length of a binary file:

    fseek(my_file, 0, SEEK_END);
    const size_t file_size = ftell(my_file);

…I know, to be pedantic fseek(file, 0, SEEK_END) has undefined behavior for a binary stream [1] – but frankly on the platforms where this is a problem I also don't have fstat() and anyway this is a topic for another question…

My question is: Should I check the return value of fseek() in this case?

    if (fseek(my_file, 0, SEEK_END)) {

        return 1;

    }

    const size_t file_size = ftell(my_file);

I have never seen fseek() been checked in a case like this, and I also wonder what kind of error fseek() could possibly ever return here.

EDIT:

After reading Clifford's answer, I also think that the best way to deal with fseek() and ftell() return values while calculating the size of a file is to write a dedicated function. However Clifford's good suggestion could not deal with the size_t data type (we need a size after all!), so I guess that the most practical approach in the end would be to use a pointer for storing the size of the file, and keep the return value of our dedicated function only for failures. Here is my contribution to Clifford's solution for a safe size calculator:

int fsize (FILE * const file, size_t * const size) {

    long int ftell_retval;

    if (fseek(file, 0, SEEK_END) || (ftell_retval = ftell(file)) < 0) {

        /*  Error  */
        *size = 0;
        return 1;

    }

    *size = (size_t) ftell_retval;
    return 0;

}

So that when we need to know the length of a file we could simply do:

size_t file_size;

if (fsize(my_file, &file_size)) {

    fprintf(stderr, "Error calculating the length of the file\n");
    return 1;

}
madmurphy
  • 1,451
  • 11
  • 20
  • 1
    Be careful. `const size_t file_size = ftell(my_file);` If that file is opened in text mode, `file_size` [won't be accurate](https://port70.net/~nsz/c/c11/n1570.html#7.21.9.4p2): "For a text stream, its file position indicator contains unspecified information, usable by the fseek function for returning the file position indicator for the stream to its position at the time of the ftell call; **the difference between two such return values is not necessarily a meaningful measure of the number of characters written or read**." – Andrew Henle Oct 21 '19 at 15:59
  • And if it's [in binary mode](https://port70.net/~nsz/c/c11/n1570.html#7.21.9.2p3): A binary stream need not meaningfully support `fseek` calls with a whence value of `SEEK_END`." – Andrew Henle Oct 21 '19 at 16:01
  • 1
    The problem here is that, even if `fseek` returns a success code, there is no guarantee in the Standard that the subsequent `ftell` will return anything meaningful. – Adrian Mole Oct 21 '19 at 16:01
  • 2
    I don't understand. If a file is removed before the call to `fseek`, it will return an error. A file may also be removed between the call to fseek and ftell, in which case `ftell` will return error. You should always handle errors. Are we assuming a single-process system? It is up to OS to `ferror` and it may do it anywhere. – KamilCuk Oct 21 '19 at 16:03
  • @Adrian Indeed. For `fseek()`/`ftell()` to successfully return the file size requires implementation-specific guarantees. And once you're relying on implementation-specific guarantees, you might as well use implementation-specific functions that actually do what you want. Here, that would be `stat()` or `fstat()` on POSIX systems, or `GetFileSize()` on Windows systems. – Andrew Henle Oct 21 '19 at 16:04
  • I know, Andrew and Adrian, Microsoft Windows messes up with text files. But my file is opened in binary mode (`fopen(path, "rb")`). I have studied the situation quite a lot now, and given that there is not a portable way to get the length of a file, *using `ftell()` becomes the most portable way*. Andrew, if I am not wrong “not meaningfully support `fseek` calls” on some weird systems from the last century means adding some NUL byte padding – and just that. If that is the case, it's not a problem for me, as NUL bytes get erased everywhere in my program. – madmurphy Oct 21 '19 at 16:06
  • Kamil, good point! – madmurphy Oct 21 '19 at 16:08
  • Yes, but not like that would be my advice. Writing an `fsize()` function implemented using `fseek()/ftell()` with error checking would have been less work than asking this question, and result in better code. Moreover on platforms with `fstat()` or other means, you can use an alternate implementation of `fsize()` without changing the higher level code. – Clifford Oct 21 '19 at 17:05
  • Adding your own solution to a question is [promoted by SO](https://stackoverflow.com/help/self-answer), yet as an _answer_ and not an edit to the question. Recommend to roll back the question as it was before appending an answer and then add your code as an answer below. – chux - Reinstate Monica Oct 22 '19 at 00:46

4 Answers4

3

Its always a good practice to test the return value of a function and handle it on time otherwise strange behaviour might occur which you won't be able to understand or find without an exhaustive debugging.

You can read the following link about the return value of fseek: fseek in the return value section.

This if statement is neglectable in the code pipeline while make its easier to treat problems when it occur.

David
  • 8,113
  • 2
  • 17
  • 36
3

You need perhaps to ask yourself two questions:

  1. What will ftell() return if fseek() has failed?
  2. Can I handle failure in any meaningful way?

If fseek() fails it returns a non-zero value. If ftell() fails (which it likely will if fseek() has failed), it will return -1L - so is more deterministic, which from an error handling point of view is better.

However there are potentially ways in which fseek() could fail that do not cause ftell() to fail (unlikely perhaps, but the failure modes are implementation defined), so it is better perhaps to test fseek() to be sure you are not getting an erroneous answer from ftell().

Since your aim is to get the file size, and the use of fseek/ftell is just a way of synthesising that, it makes more sense to define a file-size function, so that the caller need only be concerned with handling the failure to obtain a valid file size rather than the failure of implementation details. The point being is if you want the file size, you don't want to have to handle errors for fseek() since that was a means to an end and not directly related to what you need to achieve - failure of fseek() is a non-deterministic side-effect, and the effect is an unknown file size - better then to behave "as-if" ftell() had failed without risking misleading behaviour by actually calling ftell():

long fsize( FILE* file )
{
    long size = -1 '  // as-if ftell() had failed
    if( fseek( file, 0, SEEK_END ) == 0 )
    {
        size = ftell( file ) ;
    }

    return size ;
}

Then your code will be:

const long file_size = fsize(my_file);

Then at the application level you only need to handle the error file_size < 0, you have no interest in whether fseek() or ftell() failed, just that you don't know the file size.

Clifford
  • 88,407
  • 13
  • 85
  • 165
  • Thank you, Clifford, very good answer. I see only one problem with it though, and it is that with `size_t file_size = fsize(my_file);` `file_size` will always be a positive number, as `size_t` is an unsigned integral data type, so checking if it is negative will not be the right thing to do. Apparently then it is not possible to use a `size_t` for storing the size of a file returned by `ftell()`, but we have to use `long int` instead – although… does `ftell()` return always a `long int` on every machine? – madmurphy Oct 21 '19 at 18:42
  • @madmurphy The size_t was copied from your usage. Yes ftell() is a standard library function defined to return long. It as an issue perhaps for files > 2Gb. – Clifford Oct 21 '19 at 19:03
  • Use fseeko and ftello in POSIX or  _fseeki64 and _ftelli64 on Windows for very large file support. – Clifford Oct 21 '19 at 19:08
  • Clifford, I have edited my question to discuss the `size_t` topic. About POSIX/Windows specific solutions, I need my code to be portable basically everywhere, so I would like to stick to the C standard library as much as possible. – madmurphy Oct 21 '19 at 19:16
  • @madmurphy Portability may be better served by conditional compilation wrapped in portability layer functions. Target identifications for most systems are well defined: https://sourceforge.net/p/predef/wiki/Home/ – Clifford Oct 21 '19 at 19:20
  • That might be true… – madmurphy Oct 21 '19 at 19:23
  • @madmurphy Your solution is reasonable, though the mid-point return is making me itch, and I'd suggest using stdbool.h / bool rather than int. – Clifford Oct 21 '19 at 19:23
  • Thanks, Clifford. I had thought of using a boolean, but then I sticked to `int` for two reasons: 1) We are returning *an error code*, not a boolean, and the fact that there is only one possible error returnable is irrelevant (moreover, if we returned a boolean we would better return `1` for success and `0` for failure – if you see this choice as unusual it is because it is unusual to have booleans tout court as return values, as we normally deal with error codes, where a "special value" as zero is needed in a sea of different values); [CONTINUE …] – madmurphy Oct 21 '19 at 19:37
  • [… CONT'D] 2) We might want one day to extend our function and disambiguate the two possible failures – let's say say that we might decide one day that our function will return `1` when `fseek()` fails but `2` when `ftell()` fails. – madmurphy Oct 21 '19 at 19:37
  • @madmurphy You don't have to justify your choices to me, but the statement `if (fsize(my_file, &file_size))` treats it as Boolean that's all. – Clifford Oct 22 '19 at 05:47
  • Clifford, `if (fsize(my_file, &file_size))` is a particular usage. I can also do `if (strcmp("foo", "bar"))` and thus treat `strcmp()` as returning a boolean (as if `strcmp()` was named `strings_are_different()`), but that doesn't mean that `strcmp()` returns booleans… – madmurphy Oct 22 '19 at 14:33
  • @madmurphy I appreciate that; I did not mean it did not work, simply that it is bad form. If later you as you suggested add additional return codes, this code may then break. Anyway this is way off topic. – Clifford Oct 22 '19 at 22:15
1

fseek can return an error in the case where the file handle is a pipe (or a serial stream).
At that point, ftell can't even tell you where it's at, because in those circumstances it's more "wherever you go, there you are".

David
  • 8,113
  • 2
  • 17
  • 36
sjcaged
  • 649
  • 6
  • 25
1

Yes, check return value, yet be more careful with type changes.

Note the the range of size_t may be more or less than 0...LONG_MAX.

// function returns an error flag
int fsize (FILE * file, size_t *size) {
  if (fseek(file, 0, SEEK_END)) {
    return 1; // fseek error  
  }

  long ftell_retval = ftell(file);
  if (ftell_retval == -1) {
    return 1; // ftell error
  }

  // Test if the file size fits in a `size_t`.
  // Improved type conversions here.
  // Portably *no* overflow possible.
  if (ftell_retval < 0 || (unsigned long) ftell_retval > SIZE_MAX) {
    return 1; // range error
  }

  *size = (size_t) ftell_retval;
  return 0;
}

Portability

Direct conversion of a long to size_t and vice versa is portably challenging given the relationship of LONG_MAX, SIZE_MAX is not defined. It may be <,==, >.

Instead first test for < 0, then, if positive, convert to unsigned long. C specifies that LONG_MAX <= ULONG_MAX, so we are OK here. Then compare the unsigned long to SIZE_MAX. Since both types are some unsigned type, the compare simply converts to the wider of the two. Again no range loss.

chux - Reinstate Monica
  • 143,097
  • 13
  • 135
  • 256
  • chux, it is not needed to check for `LONG_MAX` and `SIZE_MAX`, because although it is true that the `long int` ***data type*** might be larger than the `size_t` data type, ***the value*** returned by `ftell()` (yes, passed as a `long int`) will never exceed `SIZE_MAX`. And if in some strange drunk implementation on some alien platform this ever happened, doing `*size = (size_t) ftell_retval` as in the code above would be more than enough to truncate it. – madmurphy Oct 21 '19 at 23:29
  • @madmurphy "value returned by ftell() (yes, passed as a long int) will never exceed SIZE_MAX" is not supported by C. Files sizes can regularly exceed `SIZE_MAX`. `SIZE_MAX` is a memory (RAM) limit, not a file system one. One of the reasons for `fgetpos(), fsetpos()` – chux - Reinstate Monica Oct 21 '19 at 23:30
  • @madmurphy The above code would not truncate with `*size = (size_t) ftell_retval;` due to the prior `(unsigned long) ftell_retval > SIZE_MAX` test. – chux - Reinstate Monica Oct 21 '19 at 23:32
  • “*Files sizes can regularly exceed `SIZE_MAX`*” Agree, but then you will have few weapons against them, as both `malloc()` and `fread()` require a `size_t` argument for the size of the file/memory-to-allocate (that means you will have to read the file in chunks, but in that case – big files – you could not rely on `ftell()` either for getting the right size, as you might get a truncated value even when using a `long int` bigger than a `size_t`). Unfortunately we don't have a `fsize_t` data type in C, so it will always be a lost battle whatever we choose (`off_t` is POSIX-only). – madmurphy Oct 21 '19 at 23:55
  • “*The above code would not truncate with …*” Yes, your code would return an error before reaching that point, my code would simply truncate it. It's not completely wrong to insert your `ftell_retval > SIZE_MAX` check. What I am trying to say is that in case of very large files I would never rely on a `long signed int` either (which means: no `ftell()`). – madmurphy Oct 22 '19 at 00:12
  • P.S. “*It may be `<,==, >`*” It will never be `==`, as one is a signed type, and the other is an unsigned type, so `long int` and `size_t` will *always* be `!=` on every platform. The most likely scenario is that `size_t` will be at least twice as big as a `long int`, but often it will be even bigger. Although the C standard says that `size_t` must be at least 16-bit, while `long int` must be at least `32-bit`, normally `size_t` is implemented as `unsigned long` or `unsigned long long`. – madmurphy Oct 22 '19 at 00:23
  • @madmurphy Re: "It will never be ==, as one is a signed type, and the other is an unsigned type," C does not require `ULONG_MAX > LONG_MAX` `ULONG_MAX == LONG_MAX` is a possibility - yet rarely seen as `unsigned long` would have padding. My answer was not limited to likelihood, but portability supported and allowed by the C standard. – chux - Reinstate Monica Oct 22 '19 at 00:28
  • That's an interesting point. But I guess too theoretical to be considered… Btw, if you look at [the source code of OpenBSD's `ftell()`](https://android.googlesource.com/platform/bionic/+/donut-release/libc/stdio/ftell.c), among the comments you might notice that someone wrote this statement: *“sizeof(off_t) != sizeof(long) on all arches”*, which basically means: “Hey, carefull with `ftell()`, it might yeld a truncated value. Use `ftello()` and `off_t` instead of a `long int`.” – madmurphy Oct 22 '19 at 00:38
  • @madmurphy "I also wonder what kind of error fseek() could possibly ever return here." invited seeking not just the usual suspects, but a far more reaching interest into what is possible. This answer addressed those more reaching concerns - with experience, will not seem so theoretical. C have survived over 40 years because it adapts to new and novel implementations so readily - so coding to the C standard allows longer longevity to your code. – chux - Reinstate Monica Oct 22 '19 at 00:51
  • chux, I am more than convinced now that it's a good idea to test both `fseek()` and `ftell()` for any kind of possible errors ;) – madmurphy Oct 22 '19 at 00:56