Linux Page Cache Replacement

Question

I have two PostgreSQL databases named data-1 and data-2 that sit on the same machine. Both databases keep 40 GB of data, and the total memory available on the machine is 68GB.

I started data-1 and data-2, and ran several queries to go over all their data. Then, I shut down data-1 and kept issuing queries against data-2. For some reason, the OS still holds on to large parts of data-1's pages in its page cache, and reserves about 35 GB of RAM to data-2's files. As a result, my queries on data-2 keep hitting disk.

I'm checking page cache usage with fincore. When I run a table scan query against data-2, I see that data-2's pages get evicted and put back into the cache in a round-robin manner. Nothing happens to data-1's pages, although they haven't been touched for days.

Does anybody know why data-1's pages aren't evicted from the page cache? I'm open to all kind of suggestions you think it might relate to problem.

This is an EC2 m2.4xlarge instance on Amazon with 68 GB of RAM and no swap space. The kernel version is:

$ uname -r
3.2.28-45.62.amzn1.x86_64

Edit-1:

It seems that there is no NUMA configuration:

$ dmesg | grep -i numa
[    0.000000] No NUMA configuration found

Edit-2:

I used page-types tool in Linux kernel source tree to monitor page cache statuses. From the results I conclude that:

data-1 pages are in state : referenced,uptodate,lru,active,private
data-2 pages are in state : referenced,uptodate,lru,mappedtodisk

It looks like the kernel is restricting a process to a single NUMA node. You can search for ways to tweak this using google, I'm not familiar with NUMA tools. — cdleonard, Nov 20 '12 at 11:21
This could very well be. I get the following when I check for NUMA page reclaim mode. Are there any other flags that I should be looking at? $ cat /proc/sys/vm/zone_reclaim_mode 0 — metdos, Nov 20 '12 at 12:13
It looks like NUMA is disabled. This odd behavior might be controlled by the DB server instead. — cdleonard, Nov 20 '12 at 12:51
PostgreSQL manages 1 GB of shared memory. Apart from that, it completely relies on the filesystem cache to manage memory available on the machine (this is different than other databases, say Oracle). In this particular case, data-1 has been disabled for days; and couldn't have touched any of the memory pages. — metdos, Nov 20 '12 at 13:01

score 1 · Answer 1 · answered Nov 20 '12 at 12:30

1

Take a look at the cpusets you have configured in /dev/cpusets. If you have multiple directories in here then you have multiple cpusets, and potentially multiple memory nodes.

The cpusets mechanism is documented in detail here: http://www.kernel.org/doc/man-pages/online/pages/man7/cpuset.7.html

answered Nov 20 '12 at 12:30

dwurf

12,393
6
30
42

I don't have /dev/cpuset, but I have /dev/cpu . Can I say this machine doesn't have NUMA enabled? – metdos Nov 20 '12 at 12:42
1

it can be checked using #dmesg | grep -i numa . you will see something like [ 0.000000] No NUMA configuration found – Ranjith Ruban Nov 20 '12 at 16:57

Linux Page Cache Replacement

1 Answers1