However does st->st_size
reflect the actual size but not the disk file size, i.e. not including kernel buffered data?
I don't understand what you mean with the kernel buffered data. The number in st->st_size
reflects the size of the file in chars. So, if the file has 1000000
chars, the number that st->st_size
will be 1000000
, with character positions from 0
to 999999
.
There are two ways to get the file size in POSIX systems:
- do an
off_t saved = lseek(fd, 0, SEEK_END);
, which returns the actual position (you must save it, to recover it later), and a second call off_t file_size = lseek(fd, saved, SEEK_SET);
which returns to the position you were before, but returns as a number the position you were before (this is the last position of the file, after the last character) If you check this, this will match with the value returned by st->st_size
.
- do a
stat(2)
to the file descriptor to get the value you mentioned up.
The first way has some drawbacks if you have multiple threads or processes sharing the file descriptor with you (by means of a dup(2)
system call, or a fork()
ed process) if they do a read(2)
, write(2)
, or lseek(2)
call between your two lseek
calls, you'll lose the position you had on the file previously and will be unable to recover to the correct place. That is weird, and makes the first approach non recommendable.
Last, there's no relationship on the file buffering done at the kernel with the file size. You always get the true file size on stat(2)
. The only thing that can be confusing you is the savings done at the kernel when you run the following snippet (but this is transparent to you and you don't have to account for it, except if you are going to copy the file to another place). Just run this tiny program:
#include <fcntl.h>
#include <unistd.h>
int main()
{
int fd = open("file", O_WRONLY | O_CREAT | O_TRUNC, 0666);
lseek(fd, 1000000, SEEK_SET);
char string[] = "Hello, world";
write(fd, string, sizeof string);
close(fd);
}
in which you will end with a 1000013
bytes file, but that uses only one or two blocks of disk space. That's a holed file, in which there are 1000000
zero bytes before the string your wrote, and the system doesn't allocate blocks in the disk for it. Only when you write on those blocks, the system will fill the parts you write with new blocks to save your data... but until then, the system will show you zero bytes, but they are not stored anywhere.
$ ll file
-rw-r----- 1 lcu lcu 1000013 4 jul. 11:52 file
$ hd file
[file]:
00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 :................
*
000f4240: 48 65 6c 6c 6f 2c 20 77 6f 72 6c 64 00 :Hello, world.
000f424d
$ _