0

I have to read some data line by line from a large file (more than 7GB), it contains a list of vertex coordinates and face to vertex connectivity information to form a mesh. I am also learning how to use open, mmap on Linux and CreateFileA, CreateFileMapping, MapViewOfFile on Windows. Both Linux and Windows versions are 64bit compiled.

When I am on Linux (using docker) with g++-10 test.cpp -O3 -std=c++17 I get around 6s. When I am on Windows (my actual PC) both with (version 19.29.30037 x64) cl test.cpp /EHsc /O3 /std:c++17 I get 13s, and with clang++-11 (from Visual Studio Build Tools) I get 11s.

Both systems (same PC, but one is using docker) use the same exact code except for generating the const char* that represents the memory array and the uint64_t size that reprents the memory size.

This is the way I switch platforms:

// includes for using a specific platform API
#ifdef _WIN32
#define WIN32_LEAN_AND_MEAN
#include <windows.h>
// using windows handle void*
#define handle_type HANDLE
#else
#include <fcntl.h>
#include <sys/mman.h>
#include <sys/stat.h>
#include <unistd.h>
// using file descriptors
#define handle_type int
#endif

Specifically the code for getting the memory in an array of char-s is:

using uint_t = std::size_t;

// open the file -----------------------------------------------------------------------------
handle_type open(const std::string& filename) {
#ifdef _WIN32
  // see windows file mapping api for parameter explanation
  return ::CreateFileA(filename.c_str(), GENERIC_READ, 0, NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL); // private access
#else
  return ::open(filename.c_str(), O_RDONLY);
#endif
}


// get the memory size to later have a bound for reading -------------------------------------
uint_t memory_size(handle_type fid) {
#ifdef _WIN32
  LARGE_INTEGER size{};
  if (!::GetFileSizeEx(fid, &size)) {
    std::cerr << "file not found\n";
    return size.QuadPart;
  }
  return size.QuadPart;
#else
  struct stat sb;
  // get the file stats and check if not zero size
  if (fstat(fid, &sb)) {
    std::cerr << "file not found\n";
    return decltype(sb.st_size){};
  }
  return sb.st_size;
#endif
}

// get the actual char array to access memory ------------------------------------------------
const char* memory_map(handle_type fid, uint_t memory_size) {
#ifdef _WIN32
  HANDLE mapper = ::CreateFileMapping(fid, NULL, PAGE_READONLY, 0, 0, NULL);
  return reinterpret_cast<const char*>(::MapViewOfFile(mapper, FILE_MAP_READ, 0, 0, memory_size));
#else
  return reinterpret_cast<const char*>(::mmap(NULL, memory_size, PROT_READ, MAP_PRIVATE, fid, 0));
#endif
}

I am completely new to this sort of parsing and was wondering if I am doing something wrong in choosing the parameters in the Windows API (to mimic the behaviour of mmap) or if the difference in time is a matter of compilers/systems and have to accept it?

The actual time to open, get the memory size, and the memory map is negligible both on Linux and on Windows, the rest of the code is identical, as it only operates using the const char* and size_t info.

Thanks for taking the time to read. Any tip is greatly appreciated and sorry if anything is unclear.

lucmobz
  • 151
  • 1
  • 6
  • There's no question in your "question". – IInspectable Jun 13 '21 at 10:10
  • You are right, I forgot the '?', now it is corrected. – lucmobz Jun 19 '21 at 05:24
  • Adding a question mark to a statement doesn't create a question. If you wish to know whether there is something wrong with your code then accepting an answer that boils down to *"use this other random library I found on the internet and let's pretend that POSIX weren't useless"* is the wrong action. Please take the [tour] to learn how this place works. – IInspectable Jun 19 '21 at 05:30
  • 1
    Thanks for taking the time to look at this, I have unaccepted the answer as it is just a suggestion, the question is exactly what you said. I wished to know if I was doing something wrong with parameters in windows to mimic the behaviour of mmap. – lucmobz Jun 19 '21 at 06:05

1 Answers1

1

Maybe you should take a look at https://github.com/alitrack/mman-win32 which is a mmap implementation for Windows. That way you don't need to write different code for Windows.

Brecht Sanders
  • 6,215
  • 1
  • 16
  • 40