mmap() slower than ofstream() for writing large amounts of data

Question

I am currently working on a project which requires writing a large amount of data (i.e. hundreds of gigabytes) to a file by using a server that has 32 cores and 64 GB of RAM. Basically, I am trying to figure out what is the most efficient way to do it in c++, so I tried mmap() and ofstream() with multiple threads writing data at the same time.

To describe my implementation, Let's make it simple, assume that we only have 4 cores and we are about to generate 100 GB of data. Using mmap(), we can do something like this:

    void Generate_data(){
         int file = open("disk_cache", O_RDWR|O_CREAT, 0644);
         lseek (file,100GB, SEEK_SET);
         write (file, "", 1);
         auto* mapPtr = (unsigned*)mmap(nullptr, 100GB, PROT_WRITE|PROT_READ, MAP_SHARED, file, 0);
         if ( mapPtr == MAP_FAILED) {
            perror("mmap");
            return ;
         }
         vector<unsigned*> startPtrs (omp_get_max_threads());
         // Set where should we start
         for (unsigned i = 0; i < omp_get_max_threads(); i ++){
              //divide 4 since unsigned is 4 bytes;
              startPtrs[i] = mapPtr + 25GB/4 * i;
         }
         #pragma omp parallel
         {
              const unsigned thread_id = omp_get_thread_num();
              unsigned* sPtr = startPtrs[thread_id];
              // Generate 25 GB data;
              for(unsigned i = 0; i<25; i++){
                 //do something, generate 1GB data
                 const vector<unsigned>& my_data = create_1GB_data();  
                 //write data to the shared memory                                                                
                 memcpy(sPtr,my_data.data(),1GB);
                 sPtr += my_data.size();
              }
         }  
         if (munmap(mapPtr, 100GB) == -1) {
            perror("mmap:");
            return ;
         }
         close(file);
    }

Alternatively, we can use std::ofstream to write data. Here are some details:

    void Generate_data(){
         #pragma omp parallel
         {
              std::ofstream out("disk_cache",std::ios::binary);
              // Set where should we start
              out.seekp (25GB * omp_get_thread_num(), std::ios::beg);
              // Generate 25 GB data;
              for(unsigned i = 0; i<25; i++){
                 // do something, generate 1GB data
                 const vector<unsigned>& my_data = create_1GB_data();
                 // write data to the file.                                                                  
                 out.write(reinterpret_cast<const char*>(my_data.data(), 1GB);
              }
              out.close();
         }  
    }

The ofstream() approach seems to be faster than mmap(), I wonder why this happened? I thought mmap() is the most efficient way for writing/reading large amounts of data which bigger than RAM. Am I using mmap() in a right way? Any ideas that I can improve? or mmap() suppose to be slow for writing a file.

Updates:

I did some modifications to the code by passing sPtr into create_1GB_data() and directly modify data over there. In this way, we can definitely avoid copy. But that seems like not the main reason why mmap() is slow. I did some observation by using htop. mmap() uses a large amount of RES and SHR memory, I guess that's because I use MAP_SHARED flag and the entire memory is shared between different threads although that's not necessary. On the other hand, ofstream does not open that much of RES and SHR memory, each thread seems to be independent with each other. Should we not use mmap() in this case?

Any helps would be appreciated, thanks in advance.

The point of `mmap()` is that you can avoid copying your data at all. If you're `memcpy()`ing the data to the file you're already sabotaging the main benefit of `mmap()` . — EOF, Nov 15 '20 at 16:49
I thought memcpy() is what we should use with mmap() according to [this](https://stackoverflow.com/questions/38194948/why-ftruncatemmapmemcpy-is-faster-than-write) — PleaseHelpppp, Nov 16 '20 at 05:30
@EOF I am a little bit confused. Since we generated the data and return as a vector, it seems like I have to copy the data from the vector to the memory. Are you saying we should pass a pointer to create_1GB_data() and directly modify the data from there? — PleaseHelpppp, Nov 16 '20 at 06:37

Maxim Egorushkin · Answer 1 · 2020-11-16T13:32:16.753

A few improvements you may like to apply:

The way you resize the file by seeking and writing one byte may create a sparse file depending on the filesystem, which may be sub-optimal in terms of performance (but optimal in disk space usage). A robust way is to use ftruncate.
Use mmap with MAP_POPULATE flag to avoid page faults on accessing each page of the mapping. You need to have enough RAM for that, which you don't, so map the file by chunks which can fit into your RAM.
Write directly into the memory mapping rather than creating a vector with data and then copying it into the memory mapping, which defeats the purpose of using mmap.

mmap() slower than ofstream() for writing large amounts of data

1 Answers1