10

I have the following problem:

I allocate a large chunk of memory (multiple GiB) via mmap with MAP_ANONYMOUS. That chunk holds a large hash map which needs to be zeroed every now and then. Not the entire mapping may be used in each round (not every page is faulted in), so memset is not a good idea - takes too long.

What is the best strategy to do this quickly?

Will

madvise(ptr, length, MADV_DONTNEED);

guarantee me that any subsequent accesses provide new empty pages?

From the Linux man madvise page:

This call does not influence the semantics of the application (except in the case of MADV_DONTNEED), but may influence its performance. The kernel is free to ignore the advice.

...

MADV_DONTNEED

Subsequent accesses of pages in this range will succeed, but will result either in reloading of the memory contents from the underlying mapped file (see mmap(2)) or zero-fill-on-demand pages for mappings without an underlying file.

...

The current Linux implementation (2.4.0) views this system call more as a command than as advice ...

Or do I have to munmap and remap the region anew?

It has to work on Linux and ideally have the same behaviour on OS X.

Community
  • 1
  • 1
Sergey L.
  • 21,822
  • 5
  • 49
  • 75
  • I don't have any way to test this, but FWIW, the [OSX](https://developer.apple.com/library/mac/documentation/Darwin/Reference/ManPages/man2/madvise.2.html) man page doesn't mention anything about `madvise`d pages being zero. The [posix](http://pubs.opengroup.org/onlinepubs/009695399/functions/posix_madvise.html) version doesn't either. Is the overhead huge for re-mmapping the memory space? – Collin Sep 03 '13 at 15:38
  • @Collin It's not overly huge performance wise, but I would need to suspend my threads and if necessary update the pointer to a new mapping. That's more parallel code that can go wrong... And I am kind of curious how this call really works. – Sergey L. Sep 03 '13 at 16:00

3 Answers3

10

There is a much easier solution to your problem that is fairly portable:

mmap(ptr, length, PROT_READ|PROT_WRITE, MAP_FIXED|MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);

Since MAP_FIXED is permitted to fail for fairly arbitrary implementation-specific reasons, falling back to memset if it returns MAP_FAILED would be advisable.

R.. GitHub STOP HELPING ICE
  • 208,859
  • 35
  • 376
  • 711
  • 1
    Thank you, this is exactly what I was looking for. It even works on my OS X. – Sergey L. Sep 04 '13 at 09:12
  • 4
    In case it isn't clear, the idea is to redo `mmap()` with a MAP_FIXED address pointing to part (or whole) of previously-mmapped memory. According to the documentation, this will throw out the previous pages, and new fresh pages are mapped in. – Armin Rigo Feb 21 '14 at 07:58
  • How is this better performance wise? Kernel still need to zeroize the page. So this would be worse then memset in userspace as there is overhead of system call, executing mmap logic and then zeroing a page. (Though OP probably is done with the question, I am still curious. I think memset would be still the best thing to do here) – Saksham Jain Jul 19 '17 at 16:25
  • @SakshamJain: OP's question is about a large mapping, not a single page. `memset` is `O(length)`. My answer is effectively `O(1)`. – R.. GitHub STOP HELPING ICE Jul 20 '17 at 01:48
  • @R.. As far as I understand, when you mmap, kernel maps a physical page to your virtual address space. But that physical page might contain old data. So kernel needs to zeroize the page. So kernel will be running a memset on all pages that are needed to support the mmap call. – Saksham Jain Jul 21 '17 at 04:16
  • @SakshamJain: "a page" is the start of where you're wrong -- a moderately large mapping could be tens of thousands of pages in length, but after the call in my answer, they do not individually have any physical instantiation; they're all COW references to a shared zero page. If you're going to immediately re-fill all the pages anyway, there's no advantage of avoiding `memset`, since the equivalent will just happen at each page fault time. But if only some will be touched, it makes a big difference. – R.. GitHub STOP HELPING ICE Jul 21 '17 at 23:02
2

On Linux, you can rely on MADV_DONTNEED on an anonymous mapping zeroing the mapping. This isn't portable, though - madvise() itself isn't standardised. posix_madvise() is standardised, but the POSIX_MADV_DONTNEED does not have the same behaviour as the Linux MADV_DONTNEED flag - posix_madvise() is always advisory, and does not affect the semantics of the application.

caf
  • 233,326
  • 40
  • 323
  • 462
1

This madvise behavior is certainly not standard, so this wouldn't be portable.

If the part that you want to zero out happens to be at the end of your mapping you could get away with ftruncate. You'd have to introduce one step more:

  1. shm_open to have a "persistent" file descriptor to your data
  2. ftruncate to the needed size
  3. mmap of that FD

Then you could always

  1. munmap
  2. ftruncate to something short
  3. ftruncate to the real length you need
  4. mmap again

and then the part that you "remapped" would be zero initialized.

But have also in mind that the system has to do the zeroing of the pages. This might be a bit more efficient than the inline stuff that your compiler produces for memset, but that is not sure.

Jens Gustedt
  • 76,821
  • 6
  • 102
  • 177
  • I just need to zero the entire mapping, length remains the same. if I wanted to go through `munmap` id just call `munmap`, `mmap/MAP_ANONYMOUS`. No need to do it this complicated. What I really want to is not to go through a stage where that VM space is unmapped temporarily, but zero it if the pages are dirty and ideally release the physical RAM until it is reused. – Sergey L. Sep 03 '13 at 16:06
  • 1
    `munmap` followed by `mmap` is not safe. It has a race condition; the range will be unmapped momentarily and another thread may obtain a mapping in the region, or just segfault from trying to access the region. See my answer for a safe approach. – R.. GitHub STOP HELPING ICE Sep 03 '13 at 16:19
  • @R.., the question doesn't mention threads. But sure, if you change a mapping under your feet, you'd have to ensure that no other thread is accessing it. If this is not clear from the application this would have to be ensured by some sort of locking. But that goes far beyond the question as it is posed. Your solution has the disadvantage that it depends on implementation specific behavior. – Jens Gustedt Sep 03 '13 at 18:35
  • No, the only thing implementation-defined is whether it fails, in which case it must report failure and you have a fallback. As for threads the unsynchronized access would be problematic anyway but the race with munmap is real. And IMO answers to questions that don't specify single thread should not have race conditions. – R.. GitHub STOP HELPING ICE Sep 03 '13 at 20:53
  • It has bugged me ever since you posed as to why you suggested to resize the mapping. By "not the entire mapping may be used in each round" I meant that not every page may be touched (faulted in) in each round, but since it's a hash map those touched/untouched pages will be spread around randomly. The size of the mapping stays constant in my application though. – Sergey L. Sep 04 '13 at 17:14
  • If you are on Linuz can rely on the underlying filing system supporting it, `fallocate()` with `FALLOC_FL_PUNCH_HOLE` on you `mmap()`d file would appear to allow you to zero an arbitrary number of pages without an data I/O. – abligh Dec 02 '14 at 07:13