-1

I have a program that needs to read through many files at the same time that are filemapped. In an attempt to get the system to fetch the data before it is needed, I am calling madvise with MADV_SEQUENTIAL on each file mapping. However when it runs, this seems to have little effect. Instead the program runs at 100% cpu for a bit without any disk activity whatsoever, and then stops as it fetches the next chunks from the disk. The number of mapped files can exceed 1000 which I suspect is part of the problem. Am I doing something wrong, or is a better way of telling the system to start fetching the next chunk of each file before it is actually needed, preferably in chunks of a few MB or so.

camelccc
  • 2,847
  • 8
  • 26
  • 52
  • *Am I doing something wrong* Yes. You're using `mmap()` and expecting magic to happen and that thrashing your system's virtual memory mappings won't have any performance impact. [This guy just might know what he's talking about](https://lkml.iu.edu/hypermail/linux/kernel/0004.0/0728.html): "People love mmap() and other ways to play with the page tables to optimize away a copy operation, and sometimes it is worth it. HOWEVER, playing games with the virtual memory mapping is very expensive in itself. ..." – Andrew Henle Apr 07 '23 at 12:40
  • @Andrew Henle I am not expecting magic to happen. I am expecting MADV_SEQUENTIAL to do what it claims. I also don't see the point in referencing articles so old that they predate 64 bit linux when in those days on a 32 bit system mmap consumed valuable and very limited address space, and for that reason needed to be used very sparingly, as otherwise you were asking for memory fragmentation and all the trouble that came with that. – camelccc Apr 07 '23 at 13:07

1 Answers1

0

It seems that the man page of madvise on this platform states:

     MADV_SEQUENTIAL  Causes the VM system to depress the priority of pages
                  immediately preceding a given page when it is faulted
                  in.

The solution is therefore another thread that reads from the pages about to be accessed and storing the results in a volatile variable, to force the compiler to perform the read. This way the entire problem goes away. MADV_SEQUENTIAL does not seem to work reliably for persuading the OS to perform readahead.

camelccc
  • 2,847
  • 8
  • 26
  • 52