While trying to use memory mapped files to create a multi-gigabyte file (around 13gb), I ran into what appears to be a problem with mmap(). The initial implementation was done in c++ on Windows using boost::iostreams::mapped_file_sink and all was well. The code was then run on Linux and what took minutes on Windows became hours on Linux.
The two machines are clones of the same hardware: Dell R510 2.4GHz 8M Cache 16GB Ram 1TB Disk PERC H200 Controller.
The Linux is Oracle Enterprise Linux 6.5 using the 3.8 kernel and g++ 4.83.
There was some concern that there may be a problem with the boost library, so implementations were done with boost::interprocess::file_mapping and again with native mmap(). All three show the same behavior. The Windows and Linux performance is on par to a certain point when the Linux performance falls off badly.
Full source code and performance numbers are linked below.
// C++ code using boost::iostreams
void IostreamsMapping(size_t rowCount)
{
std::string outputFileName = "IoStreamsMapping.out";
boost::iostreams::mapped_file_params params(outputFileName);
params.new_file_size = static_cast<boost::iostreams::stream_offset>(sizeof(uint64_t) * rowCount);
boost::iostreams::mapped_file_sink fileSink(params); // NOTE: using this form of the constructor will take care of creating and sizing the file.
uint64_t* dest = reinterpret_cast<uint64_t*>(fileSink.data());
DoMapping(dest, rowCount);
}
void DoMapping(uint64_t* dest, size_t rowCount)
{
inputStream->seekg(0, std::ios::beg);
uint32_t index, value;
for (size_t i = 0; i<rowCount; ++i)
{
inputStream->read(reinterpret_cast<char*>(&index), static_cast<std::streamsize>(sizeof(uint32_t)));
inputStream->read(reinterpret_cast<char*>(&value), static_cast<std::streamsize>(sizeof(uint32_t)));
dest[index] = value;
}
}
One final test was done in Python to reproduce this in another language. The fall off happened at the same place, so looks like the same problem.
# Python code using numpy
import numpy as np
fpr = np.memmap(inputFile, dtype='uint32', mode='r', shape=(count*2))
out = np.memmap(outputFile, dtype='uint64', mode='w+', shape=(count))
print("writing output")
out[fpr[::2]]=fpr[::2]
For the c++ tests Windows and Linux have similar performance up to around 300 million int64s (with Linux looking slightly faster). It looks like performance falls off on Linux around 3Gb (400 million * 8 bytes per int64 = 3.2Gb) for both C++ and Python.
I know on 32-bit Linux that 3Gb is a magic boundary, but am unaware of similar behavior for 64-bit Linux.
The gist of the results is 1.4 minutes for Windows becoming 1.7 hours on Linux at 400 million int64s. I am actually trying to map close to 1.3 billion int64s.
Can anyone explain why there is such a disconnect in performance between Windows and Linux?
Any help or suggestions would be greatly appreciated!
Updated Results With updated Python code...Python speed now comparable with C++
Original Results NOTE: The Python results are stale