0

I want to use cat filepath > /dev/null as a cheap memory caching mechanism. What I am wondering about is: if I call it a second time, and the file is already in the disk cache, is the OS smart enough to do nothing?

Update: I've tested this on a CIFS volume, using fadvise POSIX_FADV_WILLNEED to cache the file locally (using linux-ftools on command line). Turns out that the volume needs to be mounted in read-write mode for this to work. In read only mode, the fadvise seems to be ignored. This must have something to do with the samba oplock mechanism.

Jacko
  • 12,665
  • 18
  • 75
  • 126

4 Answers4

3

No, and it cannot.

Determining if a program will do nothing is usually more complex than just running it.

Why do you need to control the memory caching anyway ? If absolutely necessary, consider using a tmpfs filesystem or using compcache (a compressed RAM block device)

BatchyX
  • 4,986
  • 2
  • 18
  • 17
  • Thanks. I want to cache files from a CIFS mount, so that they will be available in RAM the next time I read them. But, I don't want to keep track of whether the file has been cached or not. – Jacko May 22 '11 at 19:37
  • For that, the kernel needs a way to tell "this file hasn't been modified since the last time i read it". I don't think that CIFS provide a way to let a server tell a client that a file has not been modified (no, modification time doesn't count, as it is fake) – BatchyX May 23 '11 at 19:10
  • I think there is level II op-locking for CIFS. http://msdn.microsoft.com/en-us/library/aa302210.aspx – Jacko May 23 '11 at 23:20
3

It is better to posix_fadvise(...,POSIX_FADV_WILLNEED) than to cat the file to /dev/null - it requires less actual IO, and doesn't need the file contents to be read into userspace ram, destroying CPU caches.

Moreover, if the relevant part of the file is already in the cache, the posix_fadvise will proabably do a lot less work than cat file > /dev/null

If you feel that you really need the pages to be in core just now, then mmap the relevant section of the file and mlock it (unlock it afterwards; it might get discarded immediately if memory pressure is tight). That needs root privileges.

In general doing this kind of thing is a pessimisation and should be avoided, however. Forcing the kernel to behave how you want may reduce its ability to optimise the actual workload just now.

MarkR
  • 62,604
  • 14
  • 116
  • 151
  • 2
    Using `mlock()` just to fetch the pages into memory is a bit agricultural - for one thing, as you've pointed out, it requires root. You can achieve the same thing just by reading a single byte from every page in the mapping - or even better, specifying the `MAP_POPULATE` flag to `mmap()`. – caf May 23 '11 at 05:49
  • Yeah ok, MAP_POPULATE does it in a less brutal way. Still not necessarily a good idea :) – MarkR May 23 '11 at 10:03
  • Thanks, MarkR. I will try posix_fadvise. Will this work for files on a CIFS mount, in read only mode? – Jacko May 23 '11 at 17:49
  • From Linux man pages: "POSIX_FADV_WILLNEED and POSIX_FADV_NOREUSE both initiate a non-blocking read of the specified region into the page cache. The amount of data read may be decreased by the kernel depending on VM load. (A few megabytes will usually be fully satisfied, and more is rarely useful.)". So, this would be perfect for me, these files are around 100 K each. – Jacko May 23 '11 at 17:51
  • caf- could you explain your use of the term "agricultural" ? – Jacko May 23 '11 at 18:01
  • @Jacko: I think it derives from the game of cricket, where an "agricultural shot" is a wild swinging shot played with very little technique or finess. – caf May 24 '11 at 00:48
2

It won't do nothing, as the other answers have said. But if what you really meant was:

If I call it a second time, and the file is already in the disk cache, is the OS smart enough to not read it from disk a second time?

... then the answer is yes1. That's how the disk cache works, after all.


1. As long as the filesystem in question uses the page cache, anyway.

caf
  • 233,326
  • 40
  • 323
  • 462
  • 1
    To measure the difference, the cache can be cleared by writing to `/proc/sys/vm/drop_caches`. See `proc(5)`. – maxelost May 23 '11 at 08:26
1

It will be fast as hell, but it won't be a no-op (if it were, there would be legit reasons for syscalls to do unexpected things instead of their promised functions...). However, depending on the filesystem driver used, and the kernel options, you could be running close to the memory bandwidth.

sehe
  • 374,641
  • 47
  • 450
  • 633