5

How to check if the current write position is at the end of file using low-level POSIX functions? The first idea is to use lseek and fstat:

off_t sk;
struct stat st;
sk = lseek (f, 0, SEEK_CUR);
fstat (f, &st);
return st->st_size == sk;

However does st->st_size reflect the actual size but not the disk file size, i.e. not including kernel buffered data?

Another idea is to use

off_t scur, send;
scur = lseek (f, 0, SEEK_CUR);
send = lseek (f, 0, SEEK_END);
lseek (f, scur, SEEK_START);
return scur == send;

but this doesn't seems to be fast and adequate way.

Also both ways seem to be non-atomic, so if there is another process appending to the file, the size could be changed after checking current offset.

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
Nick
  • 970
  • 10
  • 20
  • 7
    The `st_size` is the size of the file at that time, using the latest data in kernel buffer pool, or on disk if it is not currently in use. – Jonathan Leffler Jul 02 '19 at 19:35
  • Maybe open it exclusive then append to the end. – 0___________ Jul 02 '19 at 19:37
  • 5
    And the moment you return from your check, the result is useless. If you want to ensure you atomically append to a file, use `O_APPEND` mode as designed. – Andrew Henle Jul 02 '19 at 19:41
  • 5
    Note that `lseek()` is a very lightweight system call; it does no I/O and simply changes a position in the in-memory control block for the file. You're right that both outline solutions have a TOCTOU (time of check, time of use) problem because they're non-atomic. I don't think there's an atomic call to do the job — positions of a file descriptor and size of a file are unrelated operations. A major question is "**why do you need to know this**"? If you use `O_APPEND`, all writes will be at the end. If you use `pwrite()`, it will write where you specify. – Jonathan Leffler Jul 02 '19 at 19:41
  • 4
    Out of curiosity, what would you do, not do, or do differently depending on whether the write position is or is not at the end? That might affect how we answer. – Ken Thomases Jul 02 '19 at 19:42
  • 3
    @JonathanLeffler *I don't think there's an atomic call to do the job* There isn't. The POSIX way to be able to atomically append to a file while also being able to write to any location is to open it in append mode and use `pwrite()` to write to a desired offset. Unfortunately, that's broken on Linux, where `pwrite()` will append to a file in `O_APPEND` mode no matter what the offset passed to `pwrite()` is. – Andrew Henle Jul 02 '19 at 19:44
  • @P__J__ , the whole point is to allow multi user access. – Nick Jul 02 '19 at 19:45
  • You're kidding, @AndrewHenle? Linux is that broken? That's ludicrous! POSIX [`pwrite()`](http://pubs.opengroup.org/onlinepubs/9699919799/functions/pwrite.html): _The `pwrite()` function shall be equivalent to `write()`, except that it writes into a given position and does not change the file offset (regardless of whether O_APPEND is set)._ So Linux is not implementing POSIX at this point. That's a serious breakage, were I asked. – Jonathan Leffler Jul 02 '19 at 19:45
  • 2
    @JonathanLeffler Just in case you're serious: http://man7.org/linux/man-pages/man2/pwrite.2.html#BUGS ;-) "However, on Linux, if a file is opened with `O_APPEND`, `pwrite()` appends data to the end of the file, regardless of the value of `offset`." – Andrew Henle Jul 02 '19 at 19:47
  • 1
    Oh …censored (throughly NSFW string of expletives omitted)… . That's a major screw-up. I had been planning to use `pwrite()`. I've just stopped because it isn't portable. Hell, that's a Microfaustian sort of stupid trick to implement. (Yeah, it's only a problem if `O_APPEND` is used when opening the file, and likely that'd not be a problem, but it is still a gratuitous breakage of the standard worthy of a vintage-2000 Microsoft implementation.) – Jonathan Leffler Jul 02 '19 at 19:49
  • 1
    @Ken Thomases , if not at the end, then there should not be any write at all (another process appended to the file and the data should not be overwritten but read and hundled in differenent way). The append mode would silently append to the end, possible mixing data that comes from several processes. No, mutex is not suitable solution unfortunately. – Nick Jul 02 '19 at 19:51
  • @JonathanLeffler I *think* you can open the file twice - one descriptor with `O_APPEND`, one without. – Andrew Henle Jul 02 '19 at 19:52
  • Yeah, but I shouldn't have to worry about it. POSIX says so. Oh well. Thanks a _lot_ for the information; I would not have gone looking for that bug for a long, long time! – Jonathan Leffler Jul 02 '19 at 19:53
  • 1
    @Nick *The append mode would silently append to the end, possible mixing data that comes from several processes.* Append is atomic, so if you use `write()`, each full write is **supposed** to be atomic. That probably has an effective limit, see https://stackoverflow.com/questions/1154446/is-file-append-atomic-in-unix But note that calls such as `fwrite()` do not translate one-to-one to the actual system call that writes the data. – Andrew Henle Jul 02 '19 at 19:53
  • @Nick atomicity implies mutual exclusion. For the time you append to the end other processes cannot do the same. No miracles – 0___________ Jul 02 '19 at 19:54
  • @Andrew Henle, atomic for single write, but not several consequent write calls. – Nick Jul 02 '19 at 19:59
  • 2
    If you ensure that your `write` call writes all the data in one swell foop, there'll be no problem with `O_APPEND` mode and interleaved output. If you try to dribble your data out to a file descriptor with multiple write operations, you'll have problems. You'll never be able to tell whether another thread or process has already broken the condition you require. You're best off trusting the `O_APPEND` mode operation. If you have multiple open file descriptions (roughly, multiple open file descriptors) for the same file, then you'll have problems too — it'll be safest if they're all `O_APPEND`. – Jonathan Leffler Jul 02 '19 at 20:00
  • I observe that POSIX is silent on how the [`dprintf()`](http://pubs.opengroup.org/onlinepubs/9699919799/functions/dprintf.html) function writes to the file descriptor — whether it is analogous to `sprintf()` plus a single `write()` or whether it can write multiple bits and pieces to the file descriptor as it processes the formatting for a single call. – Jonathan Leffler Jul 02 '19 at 20:07
  • @JonathanLeffler *I observe that POSIX is silent on how the `dprintf()` function writes to the file descriptor* The [GLIBC implementation](https://github.com/bminor/glibc/blob/7628a1b05adb1e4c6857b87c6f8b71a1d0b1d72c/stdio-common/dprintf.c) effectively does an `fdopen()` in the descriptor, so it's likely to do multiple underlying `write()` calls. Writing a version that does expand via `vs[n]printf()` and call `write()` **once** shouldn't be too hard. – Andrew Henle Jul 02 '19 at 20:29
  • 1
    @Nick : *atomic for single write, but not several consequent write calls* do you mean, for example, that several processes are reading a data source such that they might all read the same data and you want only one of them to write that data to the output file in question? you want the other readers to say "oh, someone else beat me to it" and behave differently? – landru27 Jul 02 '19 at 20:35
  • 1
    Is using [`feof`](http://man7.org/linux/man-pages/man3/feof.3p.html) (via `fdopen`) acceptable for your use-case? — I understand that this isn’t a “low-level” POSIX function but it will effectively do under the hood what you’re suggesting, and which you’re not happy with. Otherwise the canonical way to test for EOF at the current pointer is to combine this test with a `read` operation. – Konrad Rudolph Jul 02 '19 at 23:30

1 Answers1

-1

However does st->st_size reflect the actual size but not the disk file size, i.e. not including kernel buffered data?

I don't understand what you mean with the kernel buffered data. The number in st->st_size reflects the size of the file in chars. So, if the file has 1000000 chars, the number that st->st_size will be 1000000, with character positions from 0 to 999999.

There are two ways to get the file size in POSIX systems:

  • do an off_t saved = lseek(fd, 0, SEEK_END);, which returns the actual position (you must save it, to recover it later), and a second call off_t file_size = lseek(fd, saved, SEEK_SET); which returns to the position you were before, but returns as a number the position you were before (this is the last position of the file, after the last character) If you check this, this will match with the value returned by st->st_size.
  • do a stat(2) to the file descriptor to get the value you mentioned up.

The first way has some drawbacks if you have multiple threads or processes sharing the file descriptor with you (by means of a dup(2) system call, or a fork()ed process) if they do a read(2), write(2), or lseek(2) call between your two lseek calls, you'll lose the position you had on the file previously and will be unable to recover to the correct place. That is weird, and makes the first approach non recommendable.

Last, there's no relationship on the file buffering done at the kernel with the file size. You always get the true file size on stat(2). The only thing that can be confusing you is the savings done at the kernel when you run the following snippet (but this is transparent to you and you don't have to account for it, except if you are going to copy the file to another place). Just run this tiny program:

#include <fcntl.h>
#include <unistd.h>
int main()
{
    int fd = open("file", O_WRONLY | O_CREAT | O_TRUNC, 0666);
    lseek(fd, 1000000, SEEK_SET);
    char string[] = "Hello, world";
    write(fd, string, sizeof string);
    close(fd);
}

in which you will end with a 1000013 bytes file, but that uses only one or two blocks of disk space. That's a holed file, in which there are 1000000 zero bytes before the string your wrote, and the system doesn't allocate blocks in the disk for it. Only when you write on those blocks, the system will fill the parts you write with new blocks to save your data... but until then, the system will show you zero bytes, but they are not stored anywhere.

$ ll file
-rw-r-----  1 lcu  lcu  1000013  4 jul.  11:52 file
$ hd file
[file]:
00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 :................
*
000f4240: 48 65 6c 6c 6f 2c 20 77 6f 72 6c 64 00          :Hello, world.
000f424d
$ _
Luis Colorado
  • 10,974
  • 1
  • 16
  • 31
  • *you will end with a `1000013` bytes file, but that uses only one or two blocks of disk space.* Only if the underlying file system supports [sparse files](https://en.wikipedia.org/wiki/Sparse_file). Not all filesystems do. – Andrew Henle Jul 04 '19 at 13:00
  • UNIX, since almost its origins, does support it. My BSD system tells me around 4Mb of disk usage (st_blocksz * st_blocks) but I'm aware that it does support it. The idea is to know that at least, the `st_size` is a good and reliable way to know. – Luis Colorado Jul 04 '19 at 13:09
  • @AndrewHenle, at&t unix v7 supports sparse files, so I'm afraid almost all unices do. Of course, that's not true in non native filesystems (like iso cdrom images, or rare filesystems) – Luis Colorado Jul 04 '19 at 14:12