7

I'm trying to understand the glibc implementation of fseek. To do so, I downloaded the glibc source code and tried to understand its function execution order.

I found the fseek implementation in libio/fseek.c. Basically, it calls the function (or rather the macro) _IO_fseek() using the same parameters. This macro is implemented in libio/iolibio.h.

It is defined as _IO_seekoff_unlocked (__fp, __offset, __whence, _IOS_INPUT|_IOS_OUTPUT) (implemented in libio/ioseekoff.c). The next step in its execution is rather confusing for me:

_IO_seekoff_unlocked basically returns _IO_SEEKOFF (fp, offset, dir, mode);, which returns _IO_seekoff_unlocked (fp, offset, dir, mode);, which should create a call loop.

Also, when using strace on an example program (seek.c):

#include <stdio.h>

int main(void) {
    printf("[Fseek] Executing fseek\n");
    FILE *f = fopen("./seek.c", "rb");

    fseek(f, 0L, SEEK_END);
}

it shows that fseek will call the read system call, even though I could not find it in the glibc implementation.

...
write(1, "[Fseek] Executing fseek\n", 24[Fseek] Executing fseek
) = 24
openat(AT_FDCWD, "./seek.c", O_RDONLY)  = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=146, ...}) = 0
fstat(3, {st_mode=S_IFREG|0644, st_size=146, ...}) = 0
lseek(3, 0, SEEK_SET)                   = 0
read(3, "#include <stdio.h>\n\nint main(voi"..., 146) = 146
exit_group(0)                           = ?
+++ exited with 0 +++

My goal is to understand how the read system call is used here. I have my own implementation of the read system call, which works well for other tests I wrote but will fail for some reason when it is called via fseek.

As an example, I use fseek in a function to get the size of a file:

long get_file_size(const char *name)
{
    FILE *temp_file = fopen(name, "rb");
    if (temp_file == NULL)
    {
        return -1;
    }

    fseek(temp_file, 0L, SEEK_END);
    long sz =  ftell(temp_file);
    fclose(temp_file);
    return sz;
}

This function will return the correct size with the "normal" read implementation but will fail with mine. So, if anybody can tell me how I can understand the use of read within fseek (which I could not find in the source), I would highly appreciate it.

S.S. Anne
  • 15,171
  • 8
  • 38
  • 76
MajorasKid
  • 733
  • 2
  • 5
  • 24
  • "This function will return the correct size", Hmmm, `ftell()` returns a `long`. This code uses `unsigned long` and omits an error check. Rather than change the type of return, returning a `long` makes more sense. – chux - Reinstate Monica Nov 20 '19 at 11:20
  • 1
    May be a read is performed in order to reset the input buffer after a seek? – linuxfan says Reinstate Monica Nov 20 '19 at 11:22
  • `_IO_SEEKOFF` calls `__seekoff` from the jump table – KamilCuk Nov 20 '19 at 11:23
  • 1
    Also, strictly speaking your `get_file_size()` does not return the size of the file. Per [**7.21.9.4 The ftell function**, paragraph 2 of the C standard](https://port70.net/~nsz/c/c11/n1570.html#7.21.9.4p2): "For a text stream, its file position indicator contains **unspecified information**, usable by the fseek function for returning the file position indicator for the stream to its position at the time of the ftell call; **the difference between two such return values is not necessarily a meaningful measure of the number of characters written or read**." – Andrew Henle Nov 20 '19 at 11:23
  • I changed my get_size function from "unsigned long" to long. It didnt change anything (as expected as the test size is 46byte, so nowhere close to the maximum values). Read might be used to reset the buffer, but for this lseek() would me more suitable i guess (not sure about this one), and as far as I can tell, _IO_SEEKOFF returns "return _IO_SEEKOFF (fp, offset, dir, mode); " ( copied directly from the source file) – MajorasKid Nov 20 '19 at 11:24
  • 1
    I would like to remove the part for the size function. That is not interesting, rather boring indeed. What is interesting is why there is a `read` in the `fseek` for text files. – Antti Haapala -- Слава Україні Nov 20 '19 at 11:25
  • 1
    For telling the size of the file you need to open it in binary mode. And you'll use `fstat` instead of seek and tell.. – Antti Haapala -- Слава Україні Nov 20 '19 at 11:25
  • Actually I am going to change *that too* because it does not seem to matter. – Antti Haapala -- Слава Україні Nov 20 '19 at 11:27
  • here it is: https://code.woboq.org/userspace/glibc/libio/wfileops.c.html#902 the read call – KamilCuk Nov 20 '19 at 11:28
  • @AndrewHenle i implemented an alternative get_size function using lseek, which gives the correct size. But my problem still exists I guess? If the current fet_size function does not work with my read() implementation, yet it does with the reference implementation, I have an error somewhere right? Or is it undefined behaviour of the current get_size, and thus cannot be fixed? – MajorasKid Nov 20 '19 at 11:29
  • 1
    @AnttiHaapala *For telling the size of the file you need to open it in binary mode.* And that's pedantically [undefined behavior](https://port70.net/~nsz/c/c11/n1570.html#note268): "Setting the file position indicator to end-of-file, as with fseek(file, 0, SEEK_END), has undefined behavior for a binary stream ..." There's really no portable, strictly-conforming way to get the size of a file in C. Using a binary stream relies on platform-specific guarantees, so you might as well use the platform's facilities that get the size directly, such as `stat()` or `GetFileSize()`. – Andrew Henle Nov 20 '19 at 11:29
  • @AndrewHenle ah good point, I guess it is POSIX-defined though. – Antti Haapala -- Слава Україні Nov 20 '19 at 11:30
  • @AnttiHaapala POSIX does define `ftell()` to return an actual, accurate byte offset. – Andrew Henle Nov 20 '19 at 11:30
  • 2
    Anyway, the `get_size` function question is boring and can't be fixed, except by "use stat". The "why there is read in seek" is interesting. The read has nothing to do with your code failing, it is just b0rken. – Antti Haapala -- Слава Україні Nov 20 '19 at 11:31
  • @MajorasKid I don't see any actual errors, assuming a POSIX system - the interesting thing is why your code causes an underlying `read()` call. Just don't return `0` on an error - otherwise you'll confuse empty files with non-existent files. – Andrew Henle Nov 20 '19 at 11:32
  • 2
    the cause for the read is this: https://code.woboq.org/userspace/glibc/libio/wfileops.c.html#887 - I'd really wanna remove the get_file_size and just have that one. – Antti Haapala -- Слава Україні Nov 20 '19 at 11:33
  • oke thanks @AnttiHaapala. But yes, would still be interesting to see where the read() call comes from. – MajorasKid Nov 20 '19 at 11:33
  • The function is correct on POSIX, but bad. There is just something totally wrong with your *read system call*, but that would be a separate question, for which you need to add the code for *that*. Hmm I will rephrase the question so that people do not get stuck... – Antti Haapala -- Слава Україні Nov 20 '19 at 11:37
  • Hm I'm sorry, but i cannot find how the execution of fseek() ends up in _IO_wfile_seekoff. Did i overlooked something in the function call order above?. I will write a new question for my read() implementation, as it would be too offtopic for this question i guess. Thanks for any help here anyways! – MajorasKid Nov 20 '19 at 11:37

1 Answers1

6

_IO_seekoff_unlocked->_IO_SEEKOFF actually expands to JUMP3 (__seekoff, FP, OFF, DIR, MODE). JUMP3 is a macro that calls __seekoff from the FILE "jump" table/vtable.

fopen by default assigns _IO_file_jumps (or something like that, because the file can be mmap-ed etc. etc.) as the jump table for new FILEs. It is the implementation of the jump table/virtual table for a FILE.

So _IO_SEEKOFF calls _IO_file_jumps->__seekoff. It points to _IO_new_file_seekoff and finally inside that function a call is made to _IO_SYSREAD. _IO_SYSREAD calls _read from the jump table, which in turn calls _IO_file_read, which calls __read which finally executes SYSCALL_CANCEL (read).

Konrad Rudolph
  • 530,221
  • 131
  • 937
  • 1,214
KamilCuk
  • 120,984
  • 8
  • 59
  • 111