How can I specify multiple address_space structs for an inode?

Question

Say that two different processes each open two different files. Normally, they would each have their own inode and each inode would have their own struct address_space (this is the guy who remembers where the page cache pages are in memory).

But, let's say I knew that these files were initially identical. I want to come up with a way to smart share caching to the extent possible.

I was considering these strategies:

Add a new field to the struct address_space struct: a pointer to a "parent". Then, whenever I look for an existing page, I'll also look in the parent (if it exists). Whenever I write to a page, I will therefore need to fault and C-O-W the page into the main address_space. Both files will share the common parent.
Group each related set of struct address_space in a linked list. Whenever I look for an existing page, search the entire linked list. In this scenario, though, it would be disallowed to "find" a dirty page on a friend's address_space. In other words, if a page gets dirty it can't be used as a backup anymore. In this scenario, if anyone ever wrote data to the file, I would need to disassociate the address_spaces. I would also need some sort of C-O-W behavior to sustain this as well.

Can anyone tell me:

Is either or both of these ideas are sound?
What things in particular should I watch out for?

As a point of reference, I am doing a custom kernel hack to save memory because on my system there are multiple identical files being opened (but not the same inode = not sharing pagecache).

EDIT: 3rd idea:

Keep a linked list of the "related" pagecache address_space and then every time we read from disk, update every address_space struct that's open. Opening a new related file would have to cause a big page table copying thing to happen, except skip any dirty pages.

Your idea sounds interesting and reminds me of [KSM](http://www.kernel.org/doc/Documentation/vm/ksm.txt). But KSM doesn't merge pagecache pages, maybe there was no need or there's another reason. — cnicutar, Mar 23 '12 at 21:58
Yes, similar to KSM. Except unlike KSM, I want to use special knowledge of what's going on to catch and share pages as they happen, rather then comparing the data later and realizing they're the same. — Robert Martin, Mar 23 '12 at 22:27
@RobertMartin: I haven't looked at that part of the kernel since Unix, but aren't those separate `struct file` pointing to the same `inode`? — wallyk, Mar 23 '12 at 23:02
If they're the same inode, things take care of themselves, yeah. But in my case, I have _different_ inodes but identical file contents. It's because I have a union filesystem and I'm opening the files 'rw' — Robert Martin, Mar 23 '12 at 23:11

How can I specify multiple address_space structs for an inode?

0 Answers0