1

Say that two different processes each open two different files. Normally, they would each have their own inode and each inode would have their own struct address_space (this is the guy who remembers where the page cache pages are in memory).

But, let's say I knew that these files were initially identical. I want to come up with a way to smart share caching to the extent possible.

I was considering these strategies:

  1. Add a new field to the struct address_space struct: a pointer to a "parent". Then, whenever I look for an existing page, I'll also look in the parent (if it exists). Whenever I write to a page, I will therefore need to fault and C-O-W the page into the main address_space. Both files will share the common parent.

  2. Group each related set of struct address_space in a linked list. Whenever I look for an existing page, search the entire linked list. In this scenario, though, it would be disallowed to "find" a dirty page on a friend's address_space. In other words, if a page gets dirty it can't be used as a backup anymore. In this scenario, if anyone ever wrote data to the file, I would need to disassociate the address_spaces. I would also need some sort of C-O-W behavior to sustain this as well.

Can anyone tell me:

  • Is either or both of these ideas are sound?
  • What things in particular should I watch out for?

As a point of reference, I am doing a custom kernel hack to save memory because on my system there are multiple identical files being opened (but not the same inode = not sharing pagecache).

EDIT: 3rd idea:

  • Keep a linked list of the "related" pagecache address_space and then every time we read from disk, update every address_space struct that's open. Opening a new related file would have to cause a big page table copying thing to happen, except skip any dirty pages.
Robert Martin
  • 16,759
  • 15
  • 61
  • 87
  • Your idea sounds interesting and reminds me of [KSM](http://www.kernel.org/doc/Documentation/vm/ksm.txt). But KSM doesn't merge pagecache pages, maybe there was no need or there's another reason. – cnicutar Mar 23 '12 at 21:58
  • Yes, similar to KSM. Except unlike KSM, I want to use special knowledge of what's going on to catch and share pages as they happen, rather then comparing the data later and realizing they're the same. – Robert Martin Mar 23 '12 at 22:27
  • @RobertMartin: I haven't looked at that part of the kernel since Unix, but aren't those separate `struct file` pointing to the same `inode`? – wallyk Mar 23 '12 at 23:02
  • If they're the same inode, things take care of themselves, yeah. But in my case, I have _different_ inodes but identical file contents. It's because I have a union filesystem and I'm opening the files 'rw' – Robert Martin Mar 23 '12 at 23:11

0 Answers0