I am currently working on a project which requires writing a large amount of data (i.e. hundreds of gigabytes) to a file by using a server that has 32 cores and 64 GB of RAM. Basically, I am trying to figure out what is the most efficient way to do it in c++, so I tried mmap() and ofstream() with multiple threads writing data at the same time.
To describe my implementation, Let's make it simple, assume that we only have 4 cores and we are about to generate 100 GB of data. Using mmap(), we can do something like this:
void Generate_data(){
int file = open("disk_cache", O_RDWR|O_CREAT, 0644);
lseek (file,100GB, SEEK_SET);
write (file, "", 1);
auto* mapPtr = (unsigned*)mmap(nullptr, 100GB, PROT_WRITE|PROT_READ, MAP_SHARED, file, 0);
if ( mapPtr == MAP_FAILED) {
perror("mmap");
return ;
}
vector<unsigned*> startPtrs (omp_get_max_threads());
// Set where should we start
for (unsigned i = 0; i < omp_get_max_threads(); i ++){
//divide 4 since unsigned is 4 bytes;
startPtrs[i] = mapPtr + 25GB/4 * i;
}
#pragma omp parallel
{
const unsigned thread_id = omp_get_thread_num();
unsigned* sPtr = startPtrs[thread_id];
// Generate 25 GB data;
for(unsigned i = 0; i<25; i++){
//do something, generate 1GB data
const vector<unsigned>& my_data = create_1GB_data();
//write data to the shared memory
memcpy(sPtr,my_data.data(),1GB);
sPtr += my_data.size();
}
}
if (munmap(mapPtr, 100GB) == -1) {
perror("mmap:");
return ;
}
close(file);
}
Alternatively, we can use std::ofstream to write data. Here are some details:
void Generate_data(){
#pragma omp parallel
{
std::ofstream out("disk_cache",std::ios::binary);
// Set where should we start
out.seekp (25GB * omp_get_thread_num(), std::ios::beg);
// Generate 25 GB data;
for(unsigned i = 0; i<25; i++){
// do something, generate 1GB data
const vector<unsigned>& my_data = create_1GB_data();
// write data to the file.
out.write(reinterpret_cast<const char*>(my_data.data(), 1GB);
}
out.close();
}
}
The ofstream() approach seems to be faster than mmap(), I wonder why this happened? I thought mmap() is the most efficient way for writing/reading large amounts of data which bigger than RAM. Am I using mmap() in a right way? Any ideas that I can improve? or mmap() suppose to be slow for writing a file.
Updates:
I did some modifications to the code by passing sPtr into create_1GB_data() and directly modify data over there. In this way, we can definitely avoid copy. But that seems like not the main reason why mmap() is slow. I did some observation by using htop. mmap() uses a large amount of RES and SHR memory, I guess that's because I use MAP_SHARED flag and the entire memory is shared between different threads although that's not necessary. On the other hand, ofstream does not open that much of RES and SHR memory, each thread seems to be independent with each other. Should we not use mmap() in this case?
Any helps would be appreciated, thanks in advance.