0

I'm wondering if there is a way to insert blank pages near the beginning of a large (multi-GB) file that I have open with mmap(). Obviously it would be possible to add a page or two to the end, and move everything forward with memcpy(), but this would dirty every page and require an awful long time when eventually flushed to disk.

I'm guessing that a solution would require some complex coordination between a customized filesystem and manual manipulation of the page tables: add a block to the inode, somehow update the cached pages in the VMM to reflect this, then somehow swizzle the page table to match. This sounds non-trivial, which makes me wonder if there's a better way.

This is intended as a somewhat deep question about memory and file manipulation on Linux, although I'd be happy to hear about how this can be done in other systems. I'm not particularly interested in workarounds that involve making the copying more efficient, although a technique that requires remapping but avoids the disk IO would be a good start.

Nathan Kurz
  • 1,649
  • 1
  • 14
  • 28
  • I forsee a lot of corner cases - eg. What happens to the offsets of file handles to that file? What about file handles in other processes? – caf Sep 24 '10 at 05:46
  • Without knowing what you are doing this might be a stupid suggestion but can you pad the physical file with X pages before you mmap it? Keep an index/pointer/displacement in the first page to where real data starts and change it as required. More work all around but you seem to willing to go to great lengths to make this work. – Duck Sep 24 '10 at 06:09
  • @Duck: I don't have an set purpose in mind. I've wanted something like this when dealing with large inverted indexes for full text search and when looking at ways to make better B-Trees. Starting with an extremely sparse file is an interesting idea, but doesn't solve the general case. – Nathan Kurz Sep 24 '10 at 18:54
  • @caf: I'm sure there would be issues, but they don't worry me. If one can make it work for multiple mmap() instances, the rest seems solveable. My real hope is that someone will pipe and and say "the latest beta for ZFS supports this" or "mmap_insert_page() was added in 2.6.X". – Nathan Kurz Sep 24 '10 at 18:58
  • 1
    @Nathan Kurz: Knowing the number of bugs, corner-cases and hairy problems caused just by `ftruncate()`, I tend to doubt the prospects of something like this appearing in the mainline kernel... – caf Sep 25 '10 at 02:22
  • It looks like the remap_file_pages() function has existed since 2.5, and might be able to handle the in-memory side of this. There's an example online of using it to reverse the order of pages: http://www.technovelty.org/code/linux/fremap.html – Nathan Kurz Oct 04 '10 at 07:46

1 Answers1

2

Embed a simple FAT in your file. For instance, the first 4k of the file would be a the FAT page. Data would be in following pages. The first FAT page could link to other FAT pages as your file grew. Each entry in the fat would be a data page index and the index of the next FAT entry. A FAT entry would be the page of the FAT and the index on that page of the entry itself. I think you get the idea. The FAT entries are A linked list. The FAT pages are a linked list. The FAT entries link data pages. This should be enough information to use remap_file_pages() to make your file look contiguous in memory even though its not contiguous on the disk.

johnnycrash
  • 5,184
  • 5
  • 34
  • 58