0

"...if a page has been modified and is thus dirty, it must be written back to disk to evict it, which is expensive." (In chapter 22 of OSTEP)

I don't know why. In order to evict it from memory, the dirty page will be moved to swap space, and then it will be moved back. Is it necessary to write again it to the disk? That means we need two disk I/Os when we evict a dirty page.

dazhu
  • 13
  • 4
  • What do you understand by 'swap space'? – Martin James Aug 19 '21 at 03:43
  • @MartinJames In order to supporting more memory than is physically available, we need some space on disk for moving pages back and forth. We generally refer to such space as 'swap space' because the system can swap pages out of memory to it and swap pages into memory from it. – dazhu Aug 21 '21 at 03:02
  • @dazhu That swap space is only necessary for anonymous pages because file-backed pages already have a place on disk to store them, i.e. the original file they came from. – wxz Aug 21 '21 at 16:56

2 Answers2

4

I think that you're combining two separate things. Swap space (which is a region of memory on disk) acts as a backing store to anonymous pages (pages that don't have a backing file). The statement you quoted is probably referring to a file-backed dirty page. This means the page came from a file in disk; there's no need for this page to go to swap space, it can just be written back to its file location on disk. Nonetheless, it must be written back to preserve the new data.

If file-backed pages were evicted to swap space, as your post implies, you'd be correct: it'd be a waste of disk I/O to first write the dirty page back to its file on disk and then also write it to the swap space on disk. However, file-backed pages are not evicted to swap space so that is not correct.


Swap space makes it easy to treat file-backed and anonymous pages similarly, since now both types of pages can be evicted to disk, just that anonymous pages will be evicted to swap whereas file-backed pages will go back to their normal spot in disk.

Furthermore, clean pages never need to be written back to disk because they already exist on disk in their current state. This is true even for clean anonymous pages. That's because clean anonymous pages are just virtually allocated pages that all map to the same shared zeroed page. So there would be no need to swap this memory. But when they are written to, this triggers a COW page fault, they receive their own memory, are marked dirty, and now must be moved to to swap if evicted.

wxz
  • 2,254
  • 1
  • 10
  • 31
  • I have only learned the first piece of OS, virtualization. So I don't know much about some terminologies in your answer such as anonymous page and file-backed page. Maybe when I finish the book, I will get a better understanding of your answer! – dazhu Aug 19 '21 at 02:40
  • @dazhu I think the key for you to answer your question is that swap space resides on disk, so no matter what type of page (anon. or file-backed) the page must be written back to disk if it is dirty because dirty pages have new information that aren't stored yet. – wxz Aug 19 '21 at 05:54
  • @dazhu or rather, all memory in RAM that is dirty contains new memory that isn't stored on a hard drive so it is at risk of being lost in a power outage. Therefore, if the memory is evicted, it must first be written to disk so that it can be accessed later. Swap space is a special type of disk space for one type of memory, but for now just think about "moved to swap space" as the equivalent of "writing back to disk." – wxz Aug 19 '21 at 14:42
  • When memory is not enough, we should evict a dirty page or clean page to swap space. If a clean page is evicted, we just need a disk I/O to write it to swap space. If a dirty page is evicted, we need to two disk I/Os. One is to write it to somewhere on disk, and the another is to wirte it to swap space. Do I understand correctly? – dazhu Aug 21 '21 at 10:31
  • @dazhu No that's not correct. A clean page is a page that hasn't been modified at all from its original version on disk. So when a clean page is evicted, there's no need to do any I/O, just mark the page as invalid. A dirty page on the other hand contains new information that isn't stored on the backing disk. This is where I was trying to explain the two categories of pages: file-backed pages need one disk I/O to be written back to their file location on disk. Anonymous pages (memory from a process' heap for instance that aren't backed by a file) need one disk I/O to go to the swap space. – wxz Aug 21 '21 at 16:51
  • @dazhu The whole goal is just to make sure we save new info to disk because it is bigger than RAM so it has room for the memory and because it is non-volatile (meaning it doesn't need power to store data). So clean pages already have an exact copy on disk which is why you can skip an disk I/O when they're evicted. Dirty pages need one copy, so they either get written back to their original file on disk or to the swap space on disk, not both. Depends on what type of page they are. – wxz Aug 21 '21 at 16:55
1

If the contents are not going to be needed at a later time, then there is no need to write its contents to disk (or some other type of media) when it is evicted.

Similarly, if the contents are going to be needed at a later time, then those contents must be written to disk (or some other type of media) when it is evicted so that they can be loaded back into memory at a later time without those contents having been changed. If they were changed, that would be akin to memory corruption.

(I should add that the memory contents refer to the contents for the whole page of memory.)

Sparky
  • 13,505
  • 4
  • 26
  • 27
  • If the contents of dirty page will be needed at a later time, we can swap the dirty page into memory from swap space. Why are contents of dirty page changed in swap space? – dazhu Aug 19 '21 at 03:08
  • @dazhu - The backing media is for storage and retrieval only. Dirty pages are written to backing media. While stored in backing media, they do not change. When they are needed, they are loaded from backing media into memory. Swap space is merely a class of backing media. – Sparky Aug 19 '21 at 13:52
  • @dazhu - It may be helpful to think of things from the backing media (such as a hard disk) perspective. Data is accessed by block, not byte. Though not necessary, you can often think of a memory page and disk block being the same size--say 4 kB. To update a block on disk, you must first load the entire block into memory. Then update the memory. Then write the entire block back to disk. – Sparky Aug 19 '21 at 14:08
  • "While stored in backing media, they do not change" So I think that we just need a disk I/O to wirte dirty page to swap space when memory is not enough. But in the citation, we seem to need two disk I/Os (the another is to write the dirty page to somewhere on disk). I don't know why we need two disk I/Os. – dazhu Aug 21 '21 at 10:40
  • @dazhu - One disk IO is to write the contents of the dirty page that is being evicted to backing media. However, if the evicted page is not dirty, then no write is necessary. The other disk IO is to read a new page from backing media into memory. – Sparky Aug 21 '21 at 16:49