2

I'm trying to use Java MappedByteBuffers in READ_WRITE mode to map a large file (tens of GB) or a set of numerous smallish files (~128MB). This is to implement high performance B-trees.

My problem is that on my Windows 7 laptop with 8GB of RAM and JDK 7, everything goes fine until the physical memory is full and the OS starts actually writing the data to the files. At this point, Windows slows down to a crawl. The I/Os seem to completely starve any other activity. The mouse pointer can barely move and I generally end up having to force reboot the machine.

The following code demonstrates the issue:

public static void testMap() throws Exception
{
    MappedByteBuffer[] mbbs = new MappedByteBuffer[512];
    for (int i = 0; i < 512; i++)
    {
        System.out.printf("i=%d%n", i);
        RandomAccessFile raf = new RandomAccessFile(String.format("D:/testMap.%d.bin", i), "rw");
        mbbs[i] = raf.getChannel().map(FileChannel.MapMode.READ_WRITE, 0, 128*1024*1024);
        for (int j = 0; j < 128*1024; j++) {
            mbbs[i].put(j*1024, (byte)(i*j));
        }
    }
}

I don't mind if the I/Os take some time. After a while, the OS must actually start to write the bytes into the files anyway. But, here, the process essentially starves the whole OS. How can I avoid this?

allingeek
  • 1,378
  • 8
  • 16
alex137
  • 178
  • 1
  • 8
  • 2
    It is important to recognize that there are going to be resource limitations when working with scarce resources. Is there a reason you need 512 open memory mapped files available at the same time? Depending on your use case there are a number of different approaches you might take including pooling scarce resources (open file handles). – allingeek Sep 24 '12 at 20:33
  • In reality, I expect the files to be accessed pretty much randomly. That's why I would like to be able to keep all the views mapped at the same time. This seems to be accepted as an OK practice, on 64-bit machine of course. See http://stackoverflow.com/questions/9261316/memory-mapped-mappedbytebuffer-or-direct-bytebuffer-for-db-implementation for example. It does work just fine for reading, but not completely well for writing. I suspect that after a while any memory access from any process forces the OS to swap out some of the dirty page loaded by the JVM process. Any suggestion to avoid this? – alex137 Sep 24 '12 at 21:25
  • A pooled solution would work reasonably well depending on how many files you expect to be accessed concurrently. Using a pooled approach you would be able to adjust for performance by varying the pool size. In any case, when the system runs out of RAM the OS will have to start swapping. This is why the whole machine comes to a crawl. – allingeek Sep 24 '12 at 21:30

1 Answers1

2

You are mapping 512x128x1024x1024 bytes, or 64 gigabytes, and you have 8 gigabytes. So you are literally over-committing memory by a factor of 8. That's OK for reading because the OS can just page-fault you to a page full of nulls, and all those pages could even be the same page. But when you write, the page has to be brought into virtual existence. So you are thrashing. You need either 8x the main memory or < 1/8 the amount of mapped byte buffers.

user207421
  • 305,947
  • 44
  • 307
  • 483
  • Thanks for your comment. I was expecting that the OS could be clever enough to swap out dirty pages before the whole system thrashes. You see, I'm not over-committing memory: I don't mind the OS doing lots of disk IO because the idea is that the OS is supposed to be better placed than the program to manage its caches. By the way, the mapping works just fine for reading even if the pages are not full of nulls (in my code, almost every page contains some non-null byte). The OS just triggers the IOs and manages the caches just fine, which is the point in using memory-mapped files. – alex137 Sep 25 '12 at 11:40
  • @user1695431 Committing 10x the amount of physical memory is over-committing, by definition. – user207421 Sep 25 '12 at 23:45
  • But I don't want to commit this virtual memory region to physical memory. I'm OK if the OS just swap out the pages while the program writes into them. I thought that was one of the advantage of using memory-mapped files: be able to access a file as if it was memory, even if the file is bigger than available physical memory. Besides, it works very well in reading. So, I'm still looking for a satisfactory solution to my problem... – alex137 Sep 26 '12 at 10:21