5

I'm looking for help on how to handle access to a large (defined by: larger than addressable memory) file/block device transparently and sanely within my library. Say we've a block device of 512GB in size on an 32bit architecture. 512GB is way more than we can address on a 32-bit architecture and managing portions of the device/file in memory using mmap() is something I'm trying to avoid.

What I'm trying to achieve is, to get blocks that are addressed as 64-bit numbers/offsets and that are arbitrary but per-device static in size (512 bytes, 4K, 8K, 64MB, etc.). The caller should just get the memory address and should not need to take care about freeing memory or loading the actual content into memory.

I was thinking about a mechanism as follows:

  • something like a void* get_file_address(unit64_t blk_offset) call taking an offset (the block number) and that checks if this block is mapped already and if not reads in and therefore maps it
  • some structure that keeps track of access counts to the blocks (updated on every get_file_address call)
  • a memory manager that can be utilized if memory gets low and that starts unloading seldom used blocks using before mentioned structure

The last point was irritating to me: writing a memory manager by myself doesn't seem sane. Additionally, I'm sure that I'm not the first one with this problem.

So is there any solution/library/codefragment out there that already helps to manage such or similar case? I'm ok with solutions for Win, Linux, *BSD or OS X.

grasbueschel
  • 879
  • 2
  • 8
  • 24
  • 3
    Why don't you `mmap` to only a particular portion of the file and access it as you say in blocks? `mmap` supports offset addressing so you can just traverse the file in blocks of 4K. – Nobilis Aug 31 '13 at 18:56
  • @Nobilis because if I have to use mmap, I've to use munmap too - and that's the job of a memory manager which I want to avoid coding myself, since the job seems as if it was done already: get the memory address transparently by providing a device/file + offset. – grasbueschel Aug 31 '13 at 19:46
  • Looks like you're gonna need 512GB of swap... I'm kidding guys, jeez... – Trevor Arjeski Aug 31 '13 at 19:48
  • @Nobilis sorry, the question was a bit misleading, so I've edited now to be more specific. thanks for the input! – grasbueschel Aug 31 '13 at 19:59
  • @grasbueschel No problem, hope you do come across what you need, you're probably right that somebody has already done a wrapper of some sort for this :) – Nobilis Aug 31 '13 at 21:08

1 Answers1

1

I would use "framed mmap" with "large file support" which is part of Linux since long now. Start from the Wikipedia article and then go to technical details within the SuSE web site.

There are also some examples online and a few answers here on stackoverflow. I don't think you can easily find some pre-cooked library. Like the above links suggest, source code for software that handles large multimedia files could be helpful, and their "framed" nature could lead to some interesting snippet.

Community
  • 1
  • 1
EnzoR
  • 3,107
  • 2
  • 22
  • 25