0

I have a big .txt file (over 1gb). While searching a way to open it fast I found mapping.

I managed to use CreateFile(), then I made a char buffer[] and finally put the file contents in the buffer with ReadFile(). The problem is that the file is too big, so I can't load it all at once into the buffer, because I can't make an array that big.

I think the solution would be to open and close the file at specified locations in the .txt file and get a few of the file contents each time. The only source I found explaining mapping was on MSDN but I can't find out how to do it.

So in the end, how do I read a big file with a mapping?

HANDLE my_File = CreateFileA("words.txt", GENERIC_READ, 0, NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL);
    
if (my_File == INVALID_HANDLE_VALUE)
{
    cout << "Failed to open file" << endl;
    return 0;
}
    
constexpr size_t BUFFSIZE = 1000000;
    
char buffer[BUFFSIZE];
DWORD dwBytesToRead = BUFFSIZE - 1;
DWORD dwBytesRead = 0;
    
BOOL my_Bool = ReadFile(my_File,(void*)buffer, dwBytesToRead, &dwBytesRead, NULL);
    
if (dwBytesRead > 0)
{
    buffer[dwBytesRead] = '\0';
    cout << "FILE IS: " << buffer << endl;
}
    
CloseHandle(my_File);
Jesepy
  • 37
  • 6
  • ***The problem is that the file is too big so i cant load it all at once into the buffer,because i cant make an array that big.*** I assume for some reason you are creating a 32 bit application. – drescherjm Jun 09 '21 at 00:01
  • Do you need the whole file? Often you can read and process a block of data and then read the next block into the same storage you used for the first block, process it, and repeat until you hit the end of the file. – user4581301 Jun 09 '21 at 00:01
  • how can i do that? – Jesepy Jun 09 '21 at 00:05
  • Often a better way to ask a question like this is to show what you think is your best attempt in the question. This gives answerers a much better starting point for answers. Maybe you misinterpreted a minor detail and can be handed a quick answer. Maybe the attempt wandered off the right path and found its way to Narnia, but an answerer can show you where you got lost and give a nudge in the right direction. If you show nothing answers don't know where to start or where to stop and wind up with sprawling answers that cover too much ground for you to understand in one sitting. – user4581301 Jun 09 '21 at 00:05
  • Back in days of old, before color printing and iphones, computers didn't have a lot of memory. Files could store more data than memory could hold. So the ancient method was to input a chunk of data from the file, process that chunk, then output the processed data to a file. You may want to consider using the technique. You can also use *double buffering* by having a read-thread read data into a buffer and when the buffer is full, read to another buffer. After the first buffer is full, the other thread would process the data. Use enough buffers to adjust speed issues. – Thomas Matthews Jun 09 '21 at 00:09
  • If you need to move around a lot in the file, data near the end of the file tells you you now need to read data at the beginning of the file for example, reading in chunks s is not such a good idea, but if you know that all of the data is sequential, see if the [2019 rethink in this answer](https://stackoverflow.com/a/36658802/4581301) placed inside a loop that repeats until the end of the file is reached works for you. – user4581301 Jun 09 '21 at 00:12
  • i added my code – Jesepy Jun 09 '21 at 00:26
  • you can read from file to buffer, you can map part of file to memory - in what question ? – RbMm Jun 09 '21 at 00:27
  • you have *FileOffset* and *dwNumberOfBytesToMap* parameters in call *MapViewOfFile* - this is almost equal to parameters of *ReadFile*, except offset is rounded down to the next allocation-granularity size boundary – RbMm Jun 09 '21 at 00:35
  • 1
    Note: Windows systems typically have a default stack size of 1 megabyte. `char buffer[BUFFSIZE];` is about 1 megabyte, making the program dangerously close to stack overflow and being unstable all by itself. If you dynamically allocate the buffer or make it `static` so that it's not on the stack, you won't suffer stack overflow as easily and can get a much bigger buffer. Probably orders of magnitude bigger. – user4581301 Jun 09 '21 at 00:58

1 Answers1

2

I think you are confused. The whole purpose of mapping part or all of a file into memory is to avoid the need to buffer the data yourself. Instead, the OS takes care of that for you, allowing you to access the contents of the file via a pointer, just like you would any other in-memory data structure.

Only you can decide if that's the best solution for you. In a 32 bit app, 1GB is a lot of addressing space to find. In a 64 bit app there is no such problem. As mentioned in the comments, reading the file in chunks into a smaller buffer can be a better bet, especially if you want to process it sequentially.


For some example code on how to memory map a file, see:

How to CreateFileMapping in C++?

Paul Sanders
  • 24,133
  • 4
  • 26
  • 48
  • I used fstream to open and read the txt file into an array,with a small txt it work well,but with the big one its never ends,the only solution i found was mapping but idk how to do it – Jesepy Jun 09 '21 at 00:16
  • Added something to my answer. – Paul Sanders Jun 09 '21 at 00:17
  • I ve found that post too when searching but i still couldnt allocate enough memory with new for the file – Jesepy Jun 09 '21 at 00:25
  • If you memory map the file you don't need to allocate any memory yourself. Just use the pointer returned by `MapViewOfFile` to access the data in the file. But you should build your app as 64 bit, otherwise `MapViewOfFile` is likely to fail. – Paul Sanders Jun 09 '21 at 00:28
  • 1
    @Jesepy "*but with the big one its never ends*" - then you were likely using it wrong, but we can't see what you attempted. – Remy Lebeau Jun 09 '21 at 00:30
  • 1
    @PaulSanders "*you should build your app as 64 bit*" - ideally, if possible. But there is nothing stopping a 32bit app from reading GB-sized files using a file mapping. I once wrote a 32bit log viewer that used file mappings to view GB-sized log files, it worked fine. Just don't map a view of the entire file at one time, view it in smaller chunks, and move the view around as needed. – Remy Lebeau Jun 09 '21 at 00:32
  • @RemyLebeau Well that's true, but inconvenient. – Paul Sanders Jun 09 '21 at 00:33