5

I'm using a memory mapped file that is approx. 100 GB of data. When I call CreateViewStream on that file it takes 30 minutes to create it and seems that it's because of the size of the memory mapped file but, why does take it so long? Does it copy the whole file into managed memory?

It takes much longer when I write the file with a file stream and access it without a reboot. (strangely)

displayName
  • 13,888
  • 8
  • 60
  • 75
Sebastian
  • 952
  • 1
  • 14
  • 41

2 Answers2

4

I'm unable to replicate these issues. Here's the code I used to test:

    static void Main(string[] args)
    {
        var sw = Stopwatch.StartNew();
        var mmf = MemoryMappedFile.CreateFromFile(@"f:\test.bin");
        var stream = mmf.CreateViewStream();
        for (int i = 0; i < 100000; i++)
        {
            stream.ReadByte();
        }
        Console.WriteLine(sw.Elapsed);
    }

f:\test.bin is a 100GB zero filled file that I generated for the purposes of this test. I'm able to create the MemoryMappedFile, then run CreateViewStream and read 100,000 bytes from it in 3.7s.

Please provide sample code that's exhibiting the behavior you've described and I'll be glad to pick it apart and see what's going on.

willaien
  • 2,647
  • 15
  • 24
  • I think you should do more stuff than just a `ReadByte` in your `for` loop. The compiler may have optimized the loop out and therefore you are able to run your loop in 3.7 seconds. Just my guess, I may be wrong. – displayName Sep 02 '15 at 15:20
  • 1
    @displayName: I have since deleted the test file. I can recreate it and do something, anything with it, but it's not important. The claim is that a call to `CreateViewStream()` is expensive (and takes an inordinately long time on his system), but, I've demonstrated that it, in and of itself, for a large file, is not that expensive (at least not for my own system). He needs to provide more code for anyone to be able to evaluate his claims. – willaien Sep 02 '15 at 15:26
  • Hmmm... Seems like OPs RAM/architecture is the issue. – displayName Sep 02 '15 at 15:29
  • 1
    3.7 seconds to read 100kb is expensive operation and is pitifully slow. – Matt Jun 18 '18 at 14:46
3

This is a difficult one to answer without the code, knowledge of your main memory and architecture. Therefore I can only guess some important pointers:

  1. Do you have enough RAM? Straight off, if you refer to an address that has not yet been loaded into RAM, a page fault occurs behind the scenes and reads the data into RAM for you. Your program doesn’t notice this activity because your thread is suspended while the page fault is processed. Good article here.
  2. Another important point from the same article - You have no control over how much of the MMF is kept in memory or for how long. This means that using an MMF may push other things out of RAM, such as code or data pages that you will need back “soon”. Thereby resulting in slower execution. I especially want to point any person reading this answer to another answer here, so that we have a clear idea of how slow this slowness is in terms of processor cycles.
  3. Next, you are creating a stream. Streams good for sequential access while you might be trying to read/write to it randomly.

Regarding the end-to-end run time of your code in FileStream vs MMF approach, I think you should run the tests afresh because the running your first approach might result in a warmed up cache for the second one. The results won't be correct then.

According to the MSDN documentation of MMF,

Memory-mapped files enable programmers to work with extremely large files because memory can be managed concurrently, and they allow complete, random access to a file without the need for seeking.

The way MMF works is that the entire (or a portion) of the file is mapped as virtual memory, which is paged in and out of memory by the OS transparently as you access portions of the file. This is why MMFs are good for working with large files in the first place.

You can be smarter and read a part of the entire file and perform random access by making use of:

using (var accessor = mmf.CreateViewAccessor(offset, length))
{
    //Here you have access to a specific part of the file
}

so that you have access to a view with specified offset and size, of your mammoth file's memory-mapping.

Community
  • 1
  • 1
displayName
  • 13,888
  • 8
  • 60
  • 75