Linux tmpfs and out of memory

Question

I wrote a script that constantly writes a lot of data on a tmpfs partition. The size of this partition is 40% of RAM. The size of the data in this partition never exceeds the 60% of the whole partition capacity. But even if in theory the setup is allright if I monitor the server during the day with "free -m" I notice that the free RAM constantly drops to the point that the servers starts swapping, the out of memory is reached and the system crashes.

This is my /etc/fstab entry:

tmpfs /home/tmpdata tmpfs defaults,size=40%,gid=1000,uid=1000,mode=0777 0 0

My system is Debian 8.3 on a dedicated server with 64GB. I suspect that the RAM never gets freed when the data changes, for example when a file is being deleted.

This is the free -m double output:

root@xxxx:~# free -m
             total       used       free     shared    buffers     cached
Mem:         64454      41792      22661       3884        280      39268
-/+ buffers/cache:       2243      62210
Swap:         1021          0       1021
root@xxxx:~# free -m
             total       used       free     shared    buffers     cached
Mem:         64454      41827      22626       3879        280      39272
-/+ buffers/cache:       2274      62179
Swap:         1021          0       1021

and /cat/meminfo:

root@xxx:~# cat /proc/meminfo
MemTotal:       66001072 kB
MemFree:        20659152 kB
MemAvailable:   59740116 kB
Buffers:          288776 kB
Cached:         42705492 kB
SwapCached:            0 kB
Active:         11959248 kB
Inactive:       32386536 kB
Active(anon):    4179904 kB
Inactive(anon):  1263864 kB
Active(file):    7779344 kB
Inactive(file): 31122672 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:       1046520 kB
SwapFree:        1046520 kB
Dirty:               288 kB
Writeback:             8 kB
AnonPages:       1353516 kB
Mapped:           120216 kB
Shmem:           4091776 kB
Slab:             483428 kB
SReclaimable:     300772 kB
SUnreclaim:       182656 kB
KernelStack:        7696 kB
PageTables:        40440 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    34047056 kB
Committed_AS:    7712460 kB
VmallocTotal:   34359738367 kB
VmallocUsed:      389912 kB
VmallocChunk:   34359130252 kB
HardwareCorrupted:     0 kB
DirectMap4k:        6924 kB
DirectMap2M:     2056192 kB
DirectMap1G:    67108864 kB

and cat /proc/swaps:

root@xxxx:~# cat /proc/swaps
Filename                                Type            Size    Used    Priority
/dev/sdb4                               partition       523260  0       -1
/dev/sda4                               partition       523260  0       -2

If you unlink a file which is in use, the file will only be deleted once it is no longer in use. But in that case it will still count towards the used space on the file system. So even if those files remain in use, they cannot consume more memory than the size of the tmpfs file system. You need to provide the actual output of `free -m` such that we can see if the numbers indicate any problem. On a properly configured system, the free memory is usually always close to 0, and free swap space is never close to 0. — kasperd, Feb 26 '16 at 13:28
Because if there is nothing else to use the memory for, it should be in use as disk cache. If memory usage varies, then there can be some time after memory has been freed before the disk cache has grown to fill it. That is not a problem. If memory usage (including cache) never grows to almost take up all memory, then it simply means you bought more memory than you needed, and you have wasted some money on a larger machine than you need. (And that is not the worst kind of problem for a system administrator to have.) — kasperd, Feb 26 '16 at 14:40
I've noticed that `free -m` and `cat /proc/meminfo | grep -i memfree` give two different results. Which is the right one? — Viktor Joras, Feb 26 '16 at 14:41
`free -m` is just formatting the numbers from `/proc/meminfo`. As long as you read the numbers correctly, you will get the same numbers from both locations. — kasperd, Feb 26 '16 at 14:42
In your case `22626` indicates that approximately 22GB of your memory is free. That's a reasonable amount of free memory for a machine with 64GB. If you look in `/proc/meminfo` you should find approximately `MemFree: 23169024 kB` if you were looking at the same time. Of the 41GB of memory in use, most of it is used for cache. The amount used for caching is `41827-2274` or `62179-22626` or `280+39272`, which works out to about 39GB. So you have plenty of memory for caching. — kasperd, Feb 26 '16 at 14:49
How can I prevent swapping and eventually out of memory crashes if free memory has to be close to 0 by design? If free memory is not a reference where could I look for to be sure the system is not approaching a crash? — Viktor Joras, Feb 26 '16 at 14:49
If the numbers from `free -m` always look similar to what you posted, then it indicates that you have plenty of memory for your workload. You'd probably experience a minor slowdown if you cut your memory in half. — kasperd, Feb 26 '16 at 14:50
According to your output from `free -m`, your server is not swapping. The output from `free -m` just before you start seeing problems and while you have problems, would be helpful. The amount of swap space allocated seems a bit small. You only have 1GB of swap allocated for a machine with 64GB of physical memory. Depending on your workload, that might be a problem, though it was not a problem at the time where you ran `free -m`. — kasperd, Feb 26 '16 at 14:53
Yes at the moment the script that fills the tmpfs partition is inactive and I'll not start it until I don't understand what is going on. When the server starts swapping it becomes unresponsive and I have to hard reboot it. — Viktor Joras, Feb 26 '16 at 14:57
I have added the output of `cat /proc/swaps` in the question. — Viktor Joras, Feb 26 '16 at 15:09
Next you need to dump the output of all of those commands every few seconds, while you run the problematic script again. And leave a `top` command running through ssh as well, such that you can see what `top` was outputting moments before the server became unresponsive. — kasperd, Feb 26 '16 at 15:12
The problem is the crash occurs about every ~24 hours and sometimes at random. It could happen at 3.00AM or 3.00PM so I could not find a logical time pattern. Thank you very much for the help ! I will update this post as soon as I discover the problem. — Viktor Joras, Feb 26 '16 at 15:19
"*the servers starts swapping, the out of memory is reached and the system crashes.*" The problem itself should be described in great detail, not mentioned in one sentence with no details. — David Schwartz, Feb 26 '16 at 18:14
The crash seems to be unrelated to tmpfs usage. I suggest to enable network logging, so kernel will send you something before crash. Also, didn't you find anything suspicious in server logs? — anx, Feb 26 '16 at 18:09
This is the problem, messages, kern.log and syslog are empty from the crash on. There is a hole after the crash like someone pulled out the electrical outlet and then the logs restart after the reboot. There are no errors or warnings logged just before the crash. — Viktor Joras, Feb 26 '16 at 22:47
It happened tonight at 1.00AM again but this time I was ready to monitor. `free -m` all of a sudden said free memory was 30GB and one instant before it was near 300MB while `cat /proc/swaps` showed an increasing swap usage of all 2x500MB swaps. When all of the swap space has been filled the system got unresponsive, the ssh shell got stuck and I had to hard reboot. Now my question is why all of a sudden the RAM memory has been freed while the swap has been havily used. — Viktor Joras, Feb 27 '16 at 07:28
Data over time can be helpful to see how quickly this is filling and when it swaps out. Such as writing `vmstat 1` to a file. — John Mahowald, Feb 28 '16 at 00:43
It turned out disk cache memory is not freed like in this question: http://serverfault.com/questions/288319/linux-not-freeing-large-disk-cache-when-memory-demand-goes-up — Viktor Joras, Feb 28 '16 at 08:27

Linux tmpfs and out of memory

0 Answers0