I have to read some data line by line from a large file (more than 7GB), it contains a list of vertex coordinates and face to vertex connectivity information to form a mesh. I am also learning how to use open
, mmap
on Linux and CreateFileA
, CreateFileMapping
, MapViewOfFile
on Windows. Both Linux and Windows versions are 64bit compiled.
When I am on Linux (using docker) with g++-10 test.cpp -O3 -std=c++17
I get around 6s.
When I am on Windows (my actual PC) both with (version 19.29.30037 x64) cl test.cpp /EHsc /O3 /std:c++17
I get 13s, and with clang++-11
(from Visual Studio Build Tools) I get 11s.
Both systems (same PC, but one is using docker) use the same exact code except for generating the const char*
that represents the memory array and the uint64_t
size that reprents the memory size.
This is the way I switch platforms:
// includes for using a specific platform API
#ifdef _WIN32
#define WIN32_LEAN_AND_MEAN
#include <windows.h>
// using windows handle void*
#define handle_type HANDLE
#else
#include <fcntl.h>
#include <sys/mman.h>
#include <sys/stat.h>
#include <unistd.h>
// using file descriptors
#define handle_type int
#endif
Specifically the code for getting the memory in an array of char-s is:
using uint_t = std::size_t;
// open the file -----------------------------------------------------------------------------
handle_type open(const std::string& filename) {
#ifdef _WIN32
// see windows file mapping api for parameter explanation
return ::CreateFileA(filename.c_str(), GENERIC_READ, 0, NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL); // private access
#else
return ::open(filename.c_str(), O_RDONLY);
#endif
}
// get the memory size to later have a bound for reading -------------------------------------
uint_t memory_size(handle_type fid) {
#ifdef _WIN32
LARGE_INTEGER size{};
if (!::GetFileSizeEx(fid, &size)) {
std::cerr << "file not found\n";
return size.QuadPart;
}
return size.QuadPart;
#else
struct stat sb;
// get the file stats and check if not zero size
if (fstat(fid, &sb)) {
std::cerr << "file not found\n";
return decltype(sb.st_size){};
}
return sb.st_size;
#endif
}
// get the actual char array to access memory ------------------------------------------------
const char* memory_map(handle_type fid, uint_t memory_size) {
#ifdef _WIN32
HANDLE mapper = ::CreateFileMapping(fid, NULL, PAGE_READONLY, 0, 0, NULL);
return reinterpret_cast<const char*>(::MapViewOfFile(mapper, FILE_MAP_READ, 0, 0, memory_size));
#else
return reinterpret_cast<const char*>(::mmap(NULL, memory_size, PROT_READ, MAP_PRIVATE, fid, 0));
#endif
}
I am completely new to this sort of parsing and was wondering if I am doing something wrong in choosing the parameters in the Windows API (to mimic the behaviour of mmap) or if the difference in time is a matter of compilers/systems and have to accept it?
The actual time to open, get the memory size, and the memory map is negligible both on Linux and on Windows, the rest of the code is identical, as it only operates using the const char*
and size_t
info.
Thanks for taking the time to read. Any tip is greatly appreciated and sorry if anything is unclear.