46

I'm currently experimenting with the kernel parameters found in /proc/sys/vm, especially dirty_ratio and dirty_background_ratio.

The kernel doc has the following explanations for both:

dirty_background_ratio

Contains, as a percentage of total available memory that contains free pages and reclaimable pages, the number of pages at which the background kernel flusher threads will start writing out dirty data.

and

dirty_ratio

Contains, as a percentage of total available memory that contains free pages and reclaimable pages, the number of pages at which a process which is generating disk writes will itself start writing out dirty data.

On my linux system dirty_background_ratio is 10 and dirty_ratio is 20. I understand that the difference is, who the dirty data writes. So if my used memory reaches 10% the kernel starts writing back and 20% should never be reached.

My question now is: Has the higher value of dirty_background_ratio and dirty_ratio any meaning or is it just a matter of "what is the lower value and who has it"?

Alexander Azarov
  • 12,971
  • 2
  • 50
  • 54
happyMOOyear
  • 1,255
  • 1
  • 11
  • 14

3 Answers3

58

Has the higher value of dirty_background_ratio and dirty_ratio any meaning or is it just a matter of "what is the lower value and who has it"?

In simpler words:

vm.dirty_background_ratio is the percentage of system memory which when dirty, causes the system to start writing data to the disk.

vm.dirty_ratio is the percentage of system memory which when dirty, causes the process doing writes to block and write out dirty pages to the disk.

These tunable depend on what your system is running; if you run a large database it's recommended to keep these values low, to avoid I/O bottlenecks when the system load increases.

e.g.:

vm.dirty_background_ratio=10
vm.dirty_ratio=15

In this example, when the dirty pages exceed vm.dirty_background_ratio=10 I/O starts, i.e they start getting flushed / written to the disk. When the total number of dirty pages exceed vm.dirty_ratio=15 all writes get blocked until some of the dirty pages get written to disk. You can think of the vm.dirty_ratio=15 as the upper limit.

PhilR
  • 5,375
  • 1
  • 21
  • 27
askb
  • 6,501
  • 30
  • 43
  • 8
    So let me rephrase that, just to see if I understood correctly. If the dirty_background_ratio is reached, the kernel starts doing the writebacks in the background but applications can still write to the page cache without blocking. If dirty_ratio is reached, applications block on writing until dirty_ratio is no longer reached. Is that correct? – happyMOOyear Jan 12 '15 at 12:48
  • I accepted your answer because it answers my question and helped me a lot. Thank you! Just one extra question: Is there some place where this behavior is documented? – happyMOOyear Jan 12 '15 at 13:59
  • 3
    "vm.dirty_ratio is the value that represents the percentage of MemTotal that can consume dirty pages before all processes must write dirty buffers back to disk and when this value is reached all I/O is blocked for any new writes until dirty pages have been flushed. " http://www.sysxperts.com/home/announce/vmdirtyratioandvmdirtybackgroundratio ... you can also verify this behaviour from the code. – askb Jan 12 '15 at 14:17
  • Shouldn't "... vm.dirty_ratio=10 as the upper limit" be " ... vm.dirty_ratio=15 as the upper limit" ? – hbogert Jun 22 '16 at 07:31
  • for your statements that "the process doing writes would block", if I am using java which directly writes to page cahce, when reaching this ratio, that operation will still have to be blocked by real IO? – JaskeyLam Nov 01 '16 at 08:21
  • 1
    @askb, curious about the statement "if you run large database its recommend to keep these values low to avoid I/O bottle-necks and when the system load increases". lower dirty_background_ratio is pretty straightforward, however per dirty_ratio, I guess the higher value should benefit the workload if mem is sufficient. Right? – Zaiping Bie Jul 09 '19 at 07:56
  • @ZaipingBie If you want to guarantee an *upper bound* of eg. 1 second for a query, then you need to make sure that you never have more data in the cache than can be written to disk in 1 second. You do that by lowering the dirty ratio. – Jonathan Baldwin Feb 14 '21 at 05:19
12

I have been intrigued by this very question and so experimented a bit on my Debian 7.10 system running Linux 3.2.0-4-amd64 using sysbench 0.4.12, modifying:

  • /proc/sys/vm/dirty_ratio
  • /proc/sys/vm/dirty_background_ratio

These settings are a way to delay writing to disk. They are useful as long as you have applications that write infrequently or in small chunks (e.g. web browser). If there is only one application on the system that is just generating data at a rate greater than the maximum supported by the disk then no settings matter. The writing will take as much time as it has to.

Dirty Ratio (DR) results in the process that caused the number of dirty pages to cross the threshold to block. Dirty Background Ratio (DBR) controls writing dirty pages in the background. So, if you have a low DBR, higher DR and all of your processes write in small chunks never in total crossing the supported write speed of the disk (e.g. 50 MB/s) then you will find a system that is pretty responsive. This is impressive when we bear in mind the fact that writing to RAM is usually 100 times faster (5 GB/s)! This is the importance of DBR.

Configuration parameters are useful when you are bothered about applications that write infrequently. You don't want a process writing a byte or reading a few KB to stall for 20 seconds because there is too much dirty data. This is the importance of not having a too high DR. It also ensures that some memory is available to cache recently used data.

pdp
  • 4,117
  • 1
  • 17
  • 20
6

In the modern era, everything discussed here still applies, but the behavior is somewhat different.

Older kernels, kernel would begin writing at vm.dirty_background_ratio (or the data is vm.dirty_expire_centisecs centiseconds old, default is 30 seconds.) It blocks at vm.dirty_ratio, sometimes debilitatingly because it would tend to block until the cache had drained down to either vm.dirty_background_ratio or to 0, either of which could take a very long time of course if they are set high.

Newer kernels, big effort has made to avoid "jank" related to this caching behavior. It still begins writing at vm.dirty_background_ratio. But when it gets about 1/2 way between the vm.dirty_background_ratio and vm.dirty_ratio, it begins to apply small write delays (throttle write speed of the application), between about 50%-90% there's a pretty minor slowdown, between 90-100% it ramps speed down fast so by 100% (cache is at vm.dirty_ratio) it matches application write speed to the speed it can flush writes out to the device at. There's some heuristics in there to try to keep one massive writer from starving out other apps trying to make small writes to the same device, i.e. prevent the "copying a large file janks out the system" complaint (successfully from what I can tell.)

One open question here, I really don't know how it deals with portioning out the cache between like an NVMe that's getting 2GB/sec, some hard drive that's doing like 100MB/sec and some old USB stick doing like 20MB/sec writes. It seems to work fine though!

hwertz
  • 143
  • 1
  • 6