0

Currently i have a server with 4 gb of ram runing 2 rsyncs task (100gb each) every 5 minutes and some monitoring containers, eventually the linux cache eats all ram (rsync faults) and if the monitoring stack launch a new container the system hangs becouse begins to swap, because the cache its not freed.

Normaly the ram usage is 512-700 mb of ram, all the othre ram goes to cache and not freed. Any way to fully disable the cache instead of run "echo 3 > /proc/sys/vm/drop_caches" every 30 minutes or so?

Edit: cat /proc/meminfo

    MemTotal:        4026060 kB
    MemFree:          530636 kB
    MemAvailable:    3245008 kB
    Buffers:         1307920 kB
    Cached:           215712 kB
    SwapCached:            0 kB
    Active:          1730668 kB
    Inactive:         142704 kB
    Active(anon):     347880 kB
    Inactive(anon):     1196 kB
    Active(file):    1382788 kB
    Inactive(file):   141508 kB
    Unevictable:           0 kB
    Mlocked:               0 kB
    SwapTotal:             0 kB
    SwapFree:              0 kB
    Dirty:               344 kB
    Writeback:             0 kB
    AnonPages:        349864 kB
    Mapped:           126076 kB
    Shmem:              1252 kB
    KReclaimable:    1479664 kB
    Slab:            1550136 kB
    SReclaimable:    1479664 kB
    SUnreclaim:        70472 kB
    KernelStack:        4256 kB
    PageTables:         8500 kB
    NFS_Unstable:          0 kB
    Bounce:                0 kB
    WritebackTmp:          0 kB
    CommitLimit:     2013028 kB
    Committed_AS:    1363936 kB
    VmallocTotal:   34359738367 kB
    VmallocUsed:       10988 kB
    VmallocChunk:          0 kB
    Percpu:             1384 kB
    HardwareCorrupted:     0 kB
    AnonHugePages:         0 kB
    ShmemHugePages:        0 kB
    ShmemPmdMapped:        0 kB
    FileHugePages:         0 kB
    FilePmdMapped:         0 kB
    CmaTotal:              0 kB
    CmaFree:               0 kB
    HugePages_Total:       0
    HugePages_Free:        0
    HugePages_Rsvd:        0
    HugePages_Surp:        0
    Hugepagesize:       2048 kB
    Hugetlb:               0 kB
    DirectMap4k:      538612 kB
    DirectMap2M:     3655680 kB
    DirectMap1G:     2097152 kB

The server after 20 minutes of cache clear...

  • 1
    Are you sure that this is actually the problem? Normally only unused RAM will be used for caching, and if RAM is needed caches will be dropped automatically. – vidarlo Aug 05 '22 at 14:27
  • Please edit your question to add the output of `cat /proc/meminfo` from a system in a concerning state. – John Mahowald Aug 05 '22 at 14:31
  • question updated with the info requested – Carlos Rubio Aug 05 '22 at 15:28
  • **MemAvailable: 3245008 kB** That's *a lot*. Also, how and where it begins to swap, while you have zero SwapTotal? – Nikita Kipriyanov Aug 05 '22 at 15:51
  • The problem is exaclty that, systems says a lot of memory its' available, beacuse its in cache, but it's not freed when needed. 0 swap it's becouse we prefer the server to hard crash instead of the weird beheavior when swaping – Carlos Rubio Aug 08 '22 at 19:57
  • Consider trying the very latest stable kernel version, specifically Linux 6.1 with multi-generational LRU on. Could improve page reclaim. https://lwn.net/Articles/894859/ – John Mahowald Jan 18 '23 at 06:39
  • I'll definitely try that in my development server after finished with the upgrades from ubuntu 18.04 to 22.04. In the lasts months we skip this problem using lsyncd with one of the task, but i can easy replicate this on the development server restoring the database from backups. All memory is eated by "cache" and not freed – Carlos Rubio Apr 21 '23 at 21:52

1 Answers1

1

No, you cannot disable Linux file system cache. RAM costs money and power, might as well use it.

Do not use /proc/sys/vm/drop_caches in production, it will make performance worse by not using fast RAM. It is for debugging, such as simulating cold starts for storage testing.

A number to watch in /proc/meminfo is MemAvailable, which includes easily reclaimed caches. This should be a significant percentage of MemTotal. MemFree can be very low, that is not a problem.

Prove you have a user visible performance issue. Not just you think low free memory is bad. Add response time monitoring to your applications, add a stop watch to commands you normally run by prefixing them with time command, or do full profiling such as with perf record -- <command>

Where available, ensure pressure stall information is being collected. Good Linux host metrics monitoring tools can collect it, like netdata. PSI quantifies tasks stalling for memory, which is far more valuable than memory usage.

And measure whether drop_caches is a good idea. Get application performance data from now, the status quo. drop_caches. Then measure again and quantify the change.

Diving deep on performance can be a lot of work, unfortunately time consuming but hopefully challenging and rewarding.

John Mahowald
  • 32,050
  • 2
  • 19
  • 34
  • just want to mention `/proc/sys/vm/drop_caches` does not make your system slower if you clear it, in fact I'm doing it every 10 seconds to improve my system's performance, to prevent cache from filling up my unused RAM fully and slowing my system to a crawl, I only wish I could disable it completely. In addition, if anything makes your system even slower, that would be disk cache in swap, writing the optimizations back to disk. – Tcll Jan 17 '23 at 20:25
  • also when you say RAM costs power, writing to RAM only uses more power from CPU cycles, why not save that extra power and reserve the RAM for more important tasks than clearing it for what a program wants to use, that way it keeps the system fast. OR if you PREFER to use disk cache, only fill the RAM up to 0.75% with cache data to keep the system running smoothly, and disable swap cache completely to prevent stuttering from rapid I/O to disk, undoing what disk cache is supposed to preserve. – Tcll Jan 17 '23 at 20:39
  • Sure, you have used drop_caches without major incident. My doubt is that you can quantify a significant improvement in end-user performance metrics and/or memory PSI, by only changing if you drop_caches or not. Due to repeatedly throwing away cached data and forcing slower storage I/O. If you can, that could be another interesting question. Linux memory management has plenty of silly things, but almost certainly there's a better way to manage your situation. – John Mahowald Jan 18 '23 at 07:03
  • reading from disk every time is not really that slow, in fact when the disk cache is full (RAM only) the program loads slower than just reading from disk, where as if you keep the cache below 75% total use, the program loads as the optimization intends it to function, it's only when disk cache is swapped that it's exponentially slower, to the point is slows down overall system operation... but there should be a way to just disable it entirely and make it function like WinXP, I personally have no issues loading from disk every time, in fact I personally prefer it. – Tcll Jan 18 '23 at 17:26
  • This is not the case, the only task of this server is keep files syncronized, and light monitoring a task once a day. The trouble it's that the kernel NOT free the cache, so if any memory is asked the server crash. We prefer the hard crash than the werdiness asociated with swap usage. – Carlos Rubio Apr 21 '23 at 21:48