Does the Linux filesystem cache files efficiently?

Question

I'm creating a web application running on a Linux server. The application is constantly accessing a 250K file - it loads it in memory, reads it and sends back some info to the user. Since this file is read all the time, my client is suggesting to use something like memcache to cache it to memory, presumably because it will make read operations faster.

However, I'm thinking that the Linux filesystem is probably already caching the file in memory since it's accessed frequently. Is that right? In your opinion, would memcache provide a real improvement? Or is it going to do the same thing that Linux is already doing?

I'm not really familiar with neither Linux nor memcache, so I would really appreciate if someone could clarify this.

There is no "the Linux filesystem." There are many filesystems that Linux supports. — cdhowie, Aug 19 '11 at 07:59
It sounds like `mmap` might be even better and leave the memory all up to the kernel. — Steve-o, Aug 19 '11 at 08:05
As always, when asking this kind of question, there are answers like "you're doing it wrong". You don't know the details of the project, so please spare me the patronizing attitude. — laurent, Aug 19 '11 at 08:12
Optimisation on the basis of guesswork or even informed opinion from SO is almost always going to be a waste of time. Profile the app, see if the file access is a bottleneck. Probably isn't but no one here can tell you that for sure. I would say that 250k sounds like an awfully small amount of memory to be worrying about however. — James Gaunt, Aug 19 '11 at 17:22

Robert Martin · Accepted Answer · 2011-08-19T08:11:21.343

23

Yes, if you do not modify the file each time you open it.

Linux will hold the file's information in copy-on-write pages in memory, and "loading" the file into memory should be very fast (page table swap at worst).

Edit: Though, as cdhowie points out, there is no 'linux filesystem'. However, I believe the relevant code is in linux's memory management, and is therefore independent of the filesystem in question. If you're curious, you can read in the linux source about handling vm_area_struct objects in linux/mm/mmap.c, mainly.

edited Aug 19 '11 at 08:11

answered Aug 19 '11 at 08:00

Robert Martin

16,759
15
61
87

7

To add to the answer, you could use `vmtouch` to make sure the file is in memory. This link explains: http://hoytech.com/vmtouch/ In fact, you can force the file to be always kept in filesystem cache using `vmtouch`. Refer to the link above. – goblinjuice Jan 25 '14 at 06:15

score 4 · Answer 2 · answered Aug 19 '11 at 12:04

As people have mentioned, mmap is a good solution here.

But, one 250k file is very small. You might want to read it in and put it in some sort of memory structure that matches what you want to send back to the user on startup. Ie, if it is a text file an array of lines might be a good choice, etc.

score 3 · Answer 3 · answered Mar 09 '13 at 22:18

3

The file should be cached, but make sure the noatime option is set on the mount, otherwise the access time will attempt to be saved to the file, invalidating the cache.

answered Mar 09 '13 at 22:18

user2152580

31
1

score 2 · Answer 4 · answered Aug 19 '11 at 10:17

Yes, definitely. It will keep accessed files in memory indefinitely, unless something else needs the memory.

You can control this behaviour (to some extent) with the fadvise system call. See its "man" page for more details.

A read/write system call will still normally need to copy the data, so if you see a real bottleneck doing this, consider using mmap() which can avoid the copy, by mapping the cache pages directly into the process.

score 1 · Answer 5 · answered Aug 19 '11 at 12:12

1

I guess putting that file into ramdisk (tmpfs) may make enough advantage without big modifications. Unless you are really serious about response time in microseconds unit.

answered Aug 19 '11 at 12:12

shr

525
4
7

Does the Linux filesystem cache files efficiently?

5 Answers5

Linked