timestamp accuracy on EXT4 (sub millsecond)

Question

I was writing some code in Vala where I would first get the system time, then create a file, then retrieve the time stamp of that file. The timestamp would always be earlier that the system time, somewhere between 500 and 1500 micro seconds which didn't make sense.

I then wrote a simple shell script:

while true; do
touch ~/tmp/fred.txt
stat ~/tmp/fred.txt|grep ^C
done

With the following result:

Change: 2013-01-18 16:02:44.290787250 +1100
Change: 2013-01-18 16:02:44.293787250 +1100
Change: 2013-01-18 16:02:44.296787250 +1100
Change: 2013-01-18 16:02:44.298787248 +1100
Change: 2013-01-18 16:02:44.301787248 +1100
Change: 2013-01-18 16:02:44.304787248 +1100
Change: 2013-01-18 16:02:44.306787248 +1100
Change: 2013-01-18 16:02:44.309787248 +1100
Change: 2013-01-18 16:02:44.312787248 +1100
Change: 2013-01-18 16:02:44.315787248 +1100

As you can see the first 3 digits after the decimal point (milli seconds) seem ok as they are incrementing as expected, but the 4th digit and beyond does not look right. The 4th to 9th digits seems to be doing a slow count down instead. Is there any reason for this as I though ext4 supports up to the nano second in precision. The Access and Modify timestamps behave in the same way.

Austin Phillips · Accepted Answer · 2013-01-18T06:41:17.770

19

The ext4 file system does support nanosecond resolution on stored times if the inodes are big enough to support the extended time information (256 bytes or larger). In your case, since there is greater than second resolution, this is not a problem.

Internally, the ext4 filesystem code calls current_fs_time() which is the current cached kernel time truncated to the time granularity specified in the file system's superblock which for ext4 is 1ns.

The current time within the Linux kernel is cached, and generally only updated on a timer interrupt. So if your timer interrupt is running at 10 milliseconds, the cached time will only be updated once every 10 milliseconds. When an update does occur, the accuracy of the resulting time will depend on the clock source available on your hardware.

Try this and see if you also get similar results to your stat calls:

while true; do date --rfc-3339=ns; done

On my machine (amd64, intel virtualbox) there is no quantization.

eg

2013-01-18 17:04:21.097211836+11:00
2013-01-18 17:04:21.098354731+11:00
2013-01-18 17:04:21.099282128+11:00
2013-01-18 17:04:21.100276327+11:00
2013-01-18 17:04:21.101348507+11:00
2013-01-18 17:04:21.102516837+11:00

Update:

The above check using date doesn't really show anything for this situation. This is because date will call the gettimeofday system call which will always return the most accurate time available based on the cached kernel time, adjusted by the CPU cycle time if available to give nanosecond resolution. The timestamps stored in the file system however, are only based on the cached kernel time. ie The time calculated at the last timer interrupt.

edited Jan 18 '13 at 06:41

answered Jan 18 '13 at 06:06

Austin Phillips

15,228
2
51
50

while true; do date --rfc-3339=ns; done works fine for me: 2013-01-18 17:38:33.288373231+11:00 2013-01-18 17:38:33.288966248+11:00 2013-01-18 17:38:33.289559102+11:00 2013-01-18 17:38:33.290142562+11:00 – Wayne Jan 18 '13 at 06:38
I just need microsecond resolution, so I would like to know which component of my system needs rectifying. – Wayne Jan 18 '13 at 06:46
Modifying the ext4 fs layer is probably out of the question. So I'd suggest using the `gettimeofday` system call to get the most accurate version of time possible, then use the `utimensat` system call to modify the file's time stamp. – Austin Phillips Jan 18 '13 at 06:54
1

Since the files that I want to get an accurate timestamps are out of my control, i.e. I'm not creating them, the above suggestion is not really an option. However the updated explanation about the cached kernel time does explain what I'm seeing. Are there any kernel parameters to tweak this behavior? If not, then it seems to me that this extra precision is next to useless. Are there any articles on this phenomenon? Thanks. – Wayne Jan 18 '13 at 08:28
1

Since the ext4 fs driver is choosing the time to use, you don't have much option if you don't want to modify the driver. The precision is related to the timer tick frequency. Based on your output it would seem that your kernel HZ value is around 300. Mine has been compiled with 250 Hz so I don't get any better resolution than 4ms. You could recompile your kernel with higher HZ value, but this will only get you to 1ms resolution at 1000Hz. – Austin Phillips Jan 18 '13 at 10:10
3

If it's critical, change the ext4 kernel code. I suspect that cached time is used because reading the cache is very fast. It'd probably be easy enough to change to get nanosecond resolution at the expense of worse performance for calculations of file system times. http://www.linuxsymposium.org/2005/linuxsymposium_procv1.pdf, page 219 onwards has some in-depth discussion of the kernel timer system. Not directly related but provides some background. – Austin Phillips Jan 18 '13 at 10:14
There have been [recent patches posted](https://lore.kernel.org/linux-fsdevel/20230411142708.62475-1-jlayton@kernel.org/) to use high-resolution timestamps when the times have been previously read. [This response to review](https://lore.kernel.org/linux-fsdevel/fb17a0931ae29b89d661b7b2295726689c350ae3.camel@kernel.org/#t) says "the extra fine-grained updates should be relatively rare and should (hopefully!) not cause noticeable performance blips.", confirming performance is a concern. – Jeffrey Bosboom Apr 25 '23 at 07:48

score 0 · Answer 2 · answered Jun 29 '23 at 13:18

Ext4 only supports nanosecond timestamps if the inodes are 256 bytes or larger.

The default inode size of mkfs.ext4 depends on your distro. Some "low-power systems" distros still use 128 bytes to save disk space. And some other distributions also have the rule that "if the partition is smaller than 512 MiB, use 128 byte inodes".

Having 128-byte inodes means that the filesystem will only have 1-second precision timestamps (no decimals). It also means that the filesystem won't support ACLs, SELinux labels, etc. Most seriously of all, the 128-byte inodes (second-precison timestamps) suffers from the Y2K38 bug, meaning that timestamps after 2038 are not supported.

You can manually enforce the inode size with mkfs.ext4 -I 256 when formatting. You can also change it later via tune2fs -I 256 /dev/<the_device> on an existing filesystem.

timestamp accuracy on EXT4 (sub millsecond)

2 Answers2

Linked