I have a (large) system that has cache behavior that I don't understand. It seems that the cache ran the system out of memory. How is this possible? Doesn't the kernel clean up the cache when there's memory pressure?
One of our users ran a 'cp *' between two mount points. There were a lot of large files on that mount point. Within 20 minutes of starting the cp, Free memory had dropped from 190G to 3G. Which is fine, the kernel was using the cache. However after several hours of working there began to be page_allocation_errors and the system eventually experienced a kernel panic. Shouldn't the cache have been cleaned up? What am I missing?
Sar output:
04:30:01 AM kbmemfree kbmemused %memused kbbuffers kbcached kbcommit %commit
04:30:01 AM 190117832 338882216 64.06 821904 98713680 88991700 16.06
04:40:01 AM 94106312 434893736 82.21 822088 194964388 89050160 16.07
04:50:01 AM 3014152 525985896 99.43 821300 286266580 87867416 15.86
05:00:01 AM 4209852 524790196 99.20 801136 286031744 88871652 16.04
05:10:01 AM 4714172 524285876 99.11 801344 287214104 88159636 15.91
05:20:01 AM 5400216 523599832 98.98 805096 286247064 90627168 16.35
05:30:01 AM 5511400 523488648 98.96 805416 287312192 88569240 15.98
05:40:01 AM 5650476 523349572 98.93 805236 287706616 88528168 15.98
05:50:01 AM 5691756 523308292 98.92 805700 288392120 88720760 16.01
06:00:01 AM 5445876 523554172 98.97 805884 288012648 88708384 16.01
06:10:01 AM 5509688 523490360 98.96 805752 288311060 88776544 16.02
06:20:01 AM 5403656 523596392 98.98 805992 288453688 88767020 16.02
06:30:01 AM 5439672 523560376 98.97 806224 289253584 87783180 15.84
06:40:01 AM 5454604 523545444 98.97 806540 289048948 88136728 15.90
06:50:01 AM 5542920 523457128 98.95 806900 289134376 87709596 15.83
07:00:01 AM 6186992 522813056 98.83 807312 288270024 88280072 15.93
07:10:02 AM 6160340 522839708 98.84 807568 288291456 88446384 15.96
07:20:01 AM 6178224 522821824 98.83 807832 288347544 88303360 15.93
07:30:01 AM 5910668 523089380 98.88 808076 288161444 88938156 16.05
07:40:01 AM 5986916 523013132 98.87 808332 287596868 89381608 16.13
07:50:01 AM 5987924 523012124 98.87 808524 287766636 88892100 16.04
08:00:01 AM 6176552 522823496 98.83 808628 286743392 90363500 16.31
08:10:01 AM 6307184 522692864 98.81 808780 286462580 90791068 16.38
08:20:01 AM 6016008 522984040 98.86 808972 286270564 91520944 16.52
08:30:01 AM 6066140 522933908 98.85 809212 285822480 92202952 16.64
08:40:01 AM 6122212 522877836 98.84 809340 285830432 92116260 16.62
08:50:01 AM 6215360 522784688 98.83 809420 285934580 91682044 16.54
09:00:01 AM 6144172 522855876 98.84 809504 285484836 92590744 16.71
09:10:01 AM 6216228 522783820 98.82 809612 285656692 92262828 16.65
09:20:01 AM 6098812 522901236 98.85 809968 286012856 91611356 16.53
09:30:01 AM 6082072 522917976 98.85 811316 285738912 92147848 16.63
09:40:01 AM 6026820 522973228 98.86 811232 285632132 92670104 16.72
09:50:01 AM 5993500 523006548 98.87 810980 285557268 92356968 16.67
10:00:01 AM 6103388 522896660 98.85 811184 284710840 93299232 16.84
10:10:01 AM 6224652 522775396 98.82 811448 284692440 93302020 16.84
10:20:01 AM 6206720 522793328 98.83 811560 285226000 92498224 16.69
10:30:01 AM 5946256 523053792 98.88 823404 285104536 93159260 16.81
Average: 85971238 443028810 83.75 812425 204755681 90070609 16.25
10:32:29 AM LINUX RESTART