3

I open a file:

FILE *fp = fopen("hello_world.txt", "rb");

which just has the contents Hello World!

Then I get the size and reset to the beginning:

fseek(fp, 0L, SEEK_END);
size_t sz = ftell(fp);
fseek(fp, 0L, SEEK_SET);

When I go to perform a read, it does not seem to work. read(fileno(fp), buffer, 100) returns 0.

However, if I instead do;

fread(buffer, 100, 1, fp)

This does indeed read into the buffer correctly.

Even stranger, if I change the offset for the first fseek call to 1, it works completely fine (despite being past the end of file). I'm wondering why this is happening. My initial thought would be that it has to do with clearing the EOF flag, but I thought that should at least be reset when doing fseek back to the start. Not sure why fread works though. It looks like I'm invoking some sort of undefined behavior since some things are varying when running on different machines but I have no idea why.

Here's an MCVE:

#include <stdio.h>
#include <unistd.h>

int main() {
     FILE *fp = fopen("hello_world.txt", "rb");
     fseek(fp, 0L, SEEK_END); // works fine if offset is 1, but read doesn't get any bytes if offset is 0
     size_t sz = ftell(fp);
     fseek(fp, 0L, SEEK_SET);
     char buffer[100];
     size_t chars_read = read(fileno(fp), buffer, 100);
     printf("Buffer: %s, chars: %lu", buffer, chars_read);
     fclose(fp);
     return 0;
 }
alk
  • 69,737
  • 10
  • 105
  • 255
rb612
  • 5,280
  • 3
  • 30
  • 68
  • Why do you not just use fseek one time fseek(fp, 13, SEEK_SET); like this?.. SEEK_SET => Beginning of file SEEK_CUR => Current position of the file pointer SEEK_END => End of file – Mark Apr 14 '19 at 08:43
  • 3
    Mixing stdio and low level read/write on the same open file is a really bad idea. Stick with one or the other. – Shawn Apr 14 '19 at 09:05
  • You mean `open()`? – Shawn Apr 14 '19 at 09:11
  • 1
    Use open, read, lseek *or* fopen, fread, fseek. Why do you think you need to mix them? – n. m. could be an AI Apr 14 '19 at 09:11
  • @n.m. being pretty new to C, I don't think I realized they weren't interchangeable. I thought the API would distinguish them more if they had an impact on whether or not they're mixed. I'm doing work with sockets and pipes so I've also been using the syscalls. But I suppose this is a good lesson on why not to mix! – rb612 Apr 14 '19 at 10:02

1 Answers1

3

The problem is subtile, but it boils down to:

Do not mix stream level input/output and positioning calls with low level system calls on the underlying system handle.

Here is a potential explanation of the actual problem:

  • fseek(fp, 0L, SEEK_END); uses a system call lseek(fileno(fp), 0L, 2); to determine the length of the file associated with the system handle. The length returned by the system is 12, smaller than the stream buffer size, fseek() resets the system handle position and reads the 12 bytes into the buffer, leaving the system handle position at 12, sets the stream's internal file position at 12.
  • ftell(fp); returns the stream's internal file position, 12. It does so because the stream is opened in binary mode, which is not recommended for text files because end of line sequences will not be translated into newline characters '\n' on legacy systems).
  • fseek(fp, 0L, SEEK_SET); sets the stream's internal file position to 0, which is inside the currently buffered contents, do it does not issue an lseek() system call.
  • read(fileno(fp), buffer, 100); cannot read anything because the current position for the system handle is at 12, the end of file.
  • fread(buffer, 100, 1, fp) would read the file contents from the buffer, 12 bytes, try and read more contents from the file, none is available, and return the number of characters read, 12.

Conversely, here is what happens if you pass 1 to fseek():

  • fseek(fp, 1L, SEEK_END); uses a system call lseek(fileno(fp), 0L, 2); to determine the length of the file associated with the system handle. The length returned by the system is 12, hence the requested position is 13, smaller than the stream buffer size, fseek() resets the system handle position and tries to read the 13 bytes from the file into the stream buffer but only 12 bytes are available from the file. fseek clears the buffer and issues a system call lseek(fileno(fp), 1L, 2); and keeps track of the stream internal file position as 13.
  • ftell(fp); returns the stream internal file position, which is 13.
  • fseek(fp, 0L, SEEK_SET); resets the internal file position to 0, and issues a system call lseek(fileno(fp), 0L, 0); because the position was outside the current stream buffer.
  • read(fileno(fp), buffer, 100); reads the file contents from the system handle current position, which is also 0, hence behaves as expected.

Notes:

  • This behavior is not guaranteed as the C Standard does not specify the implementation of the stream functions, but it is consistent with the observed behavior.
  • You should check the return values of fseek() and ftell() for failure.
  • Also use %zu for size_t arguments.
  • buffer is not necessarily null terminated, do not use %s to print its contents with printf, use %.*s and pass (int)chars_read as the precision value.

Here is an instrumented version:

#include <stdio.h>
#include <unistd.h>

#ifndef fileno
extern int fileno(FILE *fp); // in case fileno is not declared
#endif

int main() {
    FILE *fp = fopen("hello_world.txt", "rb");
    if (fp) {
        fseek(fp, 0L, SEEK_END);
        long sz = ftell(fp);
        fseek(fp, 0L, SEEK_SET);
        char buffer[100];
        ssize_t chars_read = read(fileno(fp), buffer, 100);
        printf("\nread(fileno(fp), buffer, 100) = %zd, Buffer: '%.*s', sz = %zu\n",
               chars_read, (int)chars_read, buffer, sz);
        fclose(fp);
    }
    fp = fopen("hello_world.txt", "rb");
    if (fp) {
        fseek(fp, 1L, SEEK_END);
        long sz = ftell(fp);
        fseek(fp, 0L, SEEK_SET);
        char buffer[100];
        ssize_t chars_read = read(fileno(fp), buffer, 100);
        printf("\nread(fileno(fp), buffer, 100) = %zd, Buffer: '%.*s', sz = %zu\n",
               chars_read, (int)chars_read, buffer, sz);
        fclose(fp);
    }
    return 0;
}

Here is a trace of the system calls on linux consistent with my tentative explanation: the file hello_world.txt contains Hello world! without a newline, 12 bytes total:

chqrlie$ strace ./rb612-1
...
<removed system calls related to program startup>
...
open("hello_world.txt", O_RDONLY)       = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=12, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f5e356ed000
fstat(3, {st_mode=S_IFREG|0644, st_size=12, ...}) = 0
lseek(3, 0, SEEK_SET)                   = 0
read(3, "Hello world!", 12)             = 12
lseek(3, 12, SEEK_SET)                  = 12
read(3, "", 100)                        = 0
fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 1), ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f5e356ec000
write(1, "\n", 1
)                       = 1
write(1, "read(fileno(fp), buffer, 100) = "..., 55read(fileno(fp), buffer, 100) = 0, Buffer: '', sz = 12
) = 55
close(3)                                = 0
munmap(0x7f5e356ed000, 4096)            = 0
open("hello_world.txt", O_RDONLY)       = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=12, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f5e356ed000
fstat(3, {st_mode=S_IFREG|0644, st_size=12, ...}) = 0
lseek(3, 0, SEEK_SET)                   = 0
read(3, "Hello world!", 13)             = 12
lseek(3, 1, SEEK_CUR)                   = 13
lseek(3, 0, SEEK_SET)                   = 0
read(3, "Hello world!", 100)            = 12
write(1, "\n", 1
)                       = 1
write(1, "read(fileno(fp), buffer, 100) = "..., 68read(fileno(fp), buffer, 100) = 12, Buffer: 'Hello world!', sz =
) = 68
close(3)                                = 0
munmap(0x7f5e356ed000, 4096)            = 0
chqrlie
  • 131,814
  • 10
  • 121
  • 189
  • Amazing answer! Thank you so much! I would've never guessed that this was really what happens under the hood, as I'm still pretty new to C. One thing I did notice though is that the return value is 0 for the `fseek` call one past the `EOF`. I would think this would fail based on the explanation, but it looks like [it's allowed](https://stackoverflow.com/a/48585649/3813411). – rb612 Apr 14 '19 at 09:25
  • @rb612: my first explanation was erroneous, I reworded the answer to be consistent with observation on linux, which happens to show the same behavior as OS/X but may have differences in the underlying implementation and sequence of system calls. – chqrlie Apr 14 '19 at 10:26