2

I have a (large) system that has cache behavior that I don't understand. It seems that the cache ran the system out of memory. How is this possible? Doesn't the kernel clean up the cache when there's memory pressure?

One of our users ran a 'cp *' between two mount points. There were a lot of large files on that mount point. Within 20 minutes of starting the cp, Free memory had dropped from 190G to 3G. Which is fine, the kernel was using the cache. However after several hours of working there began to be page_allocation_errors and the system eventually experienced a kernel panic. Shouldn't the cache have been cleaned up? What am I missing?

Sar output:

04:30:01 AM kbmemfree kbmemused  %memused kbbuffers  kbcached  kbcommit   %commit
04:30:01 AM 190117832 338882216     64.06    821904  98713680  88991700     16.06
04:40:01 AM  94106312 434893736     82.21    822088 194964388  89050160     16.07
04:50:01 AM   3014152 525985896     99.43    821300 286266580  87867416     15.86
05:00:01 AM   4209852 524790196     99.20    801136 286031744  88871652     16.04
05:10:01 AM   4714172 524285876     99.11    801344 287214104  88159636     15.91
05:20:01 AM   5400216 523599832     98.98    805096 286247064  90627168     16.35
05:30:01 AM   5511400 523488648     98.96    805416 287312192  88569240     15.98
05:40:01 AM   5650476 523349572     98.93    805236 287706616  88528168     15.98
05:50:01 AM   5691756 523308292     98.92    805700 288392120  88720760     16.01
06:00:01 AM   5445876 523554172     98.97    805884 288012648  88708384     16.01
06:10:01 AM   5509688 523490360     98.96    805752 288311060  88776544     16.02
06:20:01 AM   5403656 523596392     98.98    805992 288453688  88767020     16.02
06:30:01 AM   5439672 523560376     98.97    806224 289253584  87783180     15.84
06:40:01 AM   5454604 523545444     98.97    806540 289048948  88136728     15.90
06:50:01 AM   5542920 523457128     98.95    806900 289134376  87709596     15.83
07:00:01 AM   6186992 522813056     98.83    807312 288270024  88280072     15.93
07:10:02 AM   6160340 522839708     98.84    807568 288291456  88446384     15.96
07:20:01 AM   6178224 522821824     98.83    807832 288347544  88303360     15.93
07:30:01 AM   5910668 523089380     98.88    808076 288161444  88938156     16.05
07:40:01 AM   5986916 523013132     98.87    808332 287596868  89381608     16.13
07:50:01 AM   5987924 523012124     98.87    808524 287766636  88892100     16.04
08:00:01 AM   6176552 522823496     98.83    808628 286743392  90363500     16.31
08:10:01 AM   6307184 522692864     98.81    808780 286462580  90791068     16.38
08:20:01 AM   6016008 522984040     98.86    808972 286270564  91520944     16.52
08:30:01 AM   6066140 522933908     98.85    809212 285822480  92202952     16.64
08:40:01 AM   6122212 522877836     98.84    809340 285830432  92116260     16.62
08:50:01 AM   6215360 522784688     98.83    809420 285934580  91682044     16.54
09:00:01 AM   6144172 522855876     98.84    809504 285484836  92590744     16.71
09:10:01 AM   6216228 522783820     98.82    809612 285656692  92262828     16.65
09:20:01 AM   6098812 522901236     98.85    809968 286012856  91611356     16.53
09:30:01 AM   6082072 522917976     98.85    811316 285738912  92147848     16.63
09:40:01 AM   6026820 522973228     98.86    811232 285632132  92670104     16.72
09:50:01 AM   5993500 523006548     98.87    810980 285557268  92356968     16.67
10:00:01 AM   6103388 522896660     98.85    811184 284710840  93299232     16.84
10:10:01 AM   6224652 522775396     98.82    811448 284692440  93302020     16.84
10:20:01 AM   6206720 522793328     98.83    811560 285226000  92498224     16.69
10:30:01 AM   5946256 523053792     98.88    823404 285104536  93159260     16.81
Average:     85971238 443028810     83.75    812425 204755681  90070609     16.25

10:32:29 AM       LINUX RESTART
gfreestone
  • 31
  • 1
  • 1
    I believe you will find the fs cache is a red herring. It is perfectly normal for fs cache to eventually use 100% of shared memory. You never even reached 17% commit. If however you want to rule that out, you can always set `vm.vfs_cache_pressure = 10000` in /etc/sysctl.conf and run `sysctl -e -p` to prefer page cache. More likely, something is spiking your memory utilization in the 10 min sar window and you aren't seeing it. Look for programs that have a really high VIRT memory limit in top and that don't have hard memory constraints. What programs are you running on this machine? – Aaron Mar 07 '16 at 21:19
  • The system runs oracle databases. – gfreestone Mar 07 '16 at 21:30
  • In that case, you may want to change the sysstat cron job to gather sar data more frequently. It will take a little more disk space in /var/log/sa/ just fyi. On RHEL that would be /etc/cron.d/sysstat. You might also look at what queries were run in that last 10 minute window, if you have query logging enabled. There are probably more things you can do to find the root cause, but I am not a DBA. By chance, do any DB backups occur around that time? – Aaron Mar 07 '16 at 21:36
  • There aren't any backups running. The main reason we suspect the cp is that it has happened twice, exactly the same, page_allocation errors 5-6 hours after this large cp. – gfreestone Mar 07 '16 at 21:39
  • One other remote possibility could be memory fragmentation. I forgot the Oracle specific term, but you may want to pre-allocate the memory in Oracle so that the FS and page cache don't prevent contiguous allocation. This also gets into numa tuning, huge tlb, but there are many docs on Oracles site for system tuning. e.g. sysctl.conf, kernel options, etc. – Aaron Mar 07 '16 at 21:41
  • We already have hugepages tuned to the needs of this system. We're grasping at straws trying to figure out what happened. – gfreestone Mar 07 '16 at 22:12
  • I think changing sysstat to collect every minute may help, but I don't know what else to add. – Aaron Mar 07 '16 at 22:15

0 Answers0