How is system call return value passed back to user process?

Question

Assume we have a single core cpu running

int filedesc = open("foo.txt", O_RDONLY);

filedesc is a variable in user process, when open begins to be executed cpu gets context switch and runs kernel process, how is the return value of open be passed to filedesc?

additionally, compared to

FILE *file = fopen("foo.txt", "r");

read/file with fopen is much faster due to buffering, but under the hood it calls open, I wonder in this case does open still retrieve one byte after another? If so there would be context switch overhead for each byte since fopen buffer is in user process, with system call return value passing back and forth scenario in my first question, how come it runs faster? Thanks in advance!

I suggest you to have a look at the sources for the c-library you are using. It seems like the source for glibc can be found at the following address: http://sourceware.org/git/?p=glibc.git;a=blob;f=libio/iofopen.c;h=24e24a93136f1d19108b44d879bf7450ee07a016;hb=HEAD — kungjohan, Aug 24 '20 at 12:14
Hi @th33lf I watched this tutorial about open/fopen by Prof Sorber https://www.youtube.com/watch?v=BQJBe4IbsvQ&list=PL9IEJIKnBJjG5H0ylFAzpzs9gSmW_eICB&index=12 — mzoz, Aug 24 '20 at 12:37
@mzoz In that example, he is not timing `open` by itself. He calls `open`, `read` and `write`, out of which `open` probably forms a tiny, insignificant part of the overhead. It is called only once, while the other calls are made in a loop! Most of the time would be spent in read and write. — th33lf, Aug 24 '20 at 12:48
"How is system call return value passed back to user process?" There is an application binary interface (ABI) specification between the kernel and user space that defines (amongst other things) how parameters are passed to system calls and values are returned. — Ian Abbott, Aug 24 '20 at 16:05

ryyker · Accepted Answer · 2020-08-25T12:44:08.980

"fopen is much faster [then fopen] due to buffering, but under the hood it calls open..."
In general, by definition if function1() implementation includes calling function2(), then calling function2() directly, and if using the same option set as when called by function1(), will always have a shorter execution time. If you are seeing the opposite with fopen() and open(), then it suggests the option set used when you are calling open() directly would have to be different than when it is called within fopen(). But the implementation of the internal do_sys_open() has the same number of arguments open(), so speed differential for that reason is not possible. You should question your bench-marking technique.

Regarding how return values is returned to the user...
Linux system calls are defined using variations of SYSCALL_DEFINEn. The following example implementation of open() illustrates this, and shows that in the encapsulation of the function do_sys_open(), one of the arguments include const char __user * in both the macro and the function, allowing it to track from which user the call was initiated:

long do_sys_open(int dfd, const char __user *filename, int flags, umode_t mode)
{
    struct open_flags op;
    int fd = build_open_flags(flags, mode, &op);
    struct filename *tmp;

    if (fd)
        return fd;

    tmp = getname(filename);
    if (IS_ERR(tmp))
        return PTR_ERR(tmp);

    fd = get_unused_fd_flags(flags);
    if (fd >= 0) {
        struct file *f = do_filp_open(dfd, tmp, &op);
        if (IS_ERR(f)) {
            put_unused_fd(fd);
            fd = PTR_ERR(f);
        } else {
            fsnotify_open(f);
            fd_install(fd, f);
        }
    }
    putname(tmp);
    return fd;
}

SYSCALL_DEFINE3(open, const char __user *, filename, int, flags, umode_t, mode)
{
    if (force_o_largefile())
        flags |= O_LARGEFILE;

    return do_sys_open(AT_FDCWD, filename, flags, mode);
}

th33lf · Answer 2 · 2020-08-24T15:52:40.693

0

I guess you are a bit confused here. When you say fopen() is faster, what you actually mean is fread() & fwrite() are faster than read() and write(). This could be true of many implementations because C standard library uses buffering in userspace while most POSIX implementations do not use buffering in userspace. They might however, use buffering in kernel space.

Let's say you are copying a 1kb file. If you do this one byte at a time, using read() to get one byte from the file and write() to copy it into the other, you end up calling the corresponding system calls 1024 times. Each time, there is a context switch from user-space to kernel space. On the other hand, if you use a C library implementation that uses, say a 512-byte buffer internally, then it actually translates to only two system calls each, even though you call fread and fwrite thousands of times. Hence, it would appear to be significantly faster than if you were to use read/write() directly.

But then, instead of copying one byte at a time, you could also have used a large enough buffer in your application, calling read/write as few times as possible, to get equal or better performance with the system calls directly. In other words, it is not that the standard library API is faster than the system call (this is not possible since the library invokes system calls internally), it is simply more efficient to invoke read/write system calls using larger buffers because of the context switch overhead and the standard library has been written with this in mind.

edited Aug 24 '20 at 15:52

answered Aug 24 '20 at 13:08

th33lf

2,177
11
15

Thanks I've edited my question to clear the ambiguity. Could you confirm that buffer is maintained in kernel space? How does user process in user space access the buffer? Thanks a lot! – mzoz Aug 24 '20 at 13:41
1

No, the buffer (in case of C library) is maintained in user-space. That is the whole point. When you call `fwrite()` the data is not immediately sent to the kernel and instead is buffered locally. So multiple fwrites can happen without any overhead. Then, when enough writes have completed to fill the buffer, or depending on some other condition, the `write()` system call is finally invoked, where the data is copied from user-space to a kernel-space buffer. The intent is to minimize the number of times this needs to happen. – th33lf Aug 24 '20 at 13:46
1

Once the data reaches kernel space, it may be buffered again by, say the file system driver before it is actually committed to file, but that is out of scope of our discussion. – th33lf Aug 24 '20 at 13:51
@mzoz Really curious, any reason why you've accepted the other answer? – th33lf Aug 24 '20 at 15:47
Hi @th33lf, because the other one addresses the detailed mechanism of data passing between user and kernel processes, which is of more interest to me.. thanks for your time! – mzoz Aug 25 '20 at 00:40
@mzoz Except it doesn't - it's just a copy-paste of linux source code. And that answer talks about open when actually you meant read & write as you yourself admitted! If anything, [this comment](https://stackoverflow.com/questions/63560714/how-is-system-call-return-value-passed-back-to-user-process/63561711?noredirect=1#comment112402117_63560714) on your answer is what actually talks about how values are returned. Anyways, it's your question so it's your call to not upvote or choose as the answer, but I have to say it feels really weird! – th33lf Aug 25 '20 at 09:40

How is system call return value passed back to user process?

2 Answers2