1

I have the following simple program, which basically just mmaps a file and sums every byte in it:

#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <sys/mman.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include <string.h>

volatile uint64_t sink;

int main(int argc, char** argv) {

  if (argc < 3) {
    puts("Usage: mmap_test FILE populate|nopopulate");
    return EXIT_FAILURE;
  }

  const char *filename = argv[1];
  int populate = !strcmp(argv[2], "populate");
  uint8_t *memblock;
  int fd;
  struct stat sb;

  fd = open(filename, O_RDONLY);
  fstat(fd, &sb);
  uint64_t size = sb.st_size;

  memblock = mmap(NULL, size, PROT_READ, MAP_SHARED | (populate ? MAP_POPULATE : 0), fd, 0);

  if (memblock == MAP_FAILED) {
    perror("mmap failed");
    return EXIT_FAILURE;
  }

  //printf("Opened %s of size %lu bytes\n", filename, size);  

  uint64_t i;
  uint8_t result = 0;
  for (i = 0; i < size; i++) {
    result += memblock[i];
  }

  sink = result;

  puts("Press enter to exit...");
  getchar();

  return EXIT_SUCCESS;
}

I make it like this:

gcc -O2 -std=gnu99     mmap_test.c   -o mmap_test

You pass it a file name and either populate or nopopulate1, which controls whether MAP_POPULATE is passed to mmap or not. It waits for you to type enter before exiting (giving you time to check out stuff in /proc/<pid> or whatever).

I use a 1GB test file of random data, but you can really use anything:

dd bs=1MB count=1000 if=/dev/urandom of=/dev/shm/rand1g

When MAP_POPULATE is used, I expect zero major faults and a small number of page faults for a file in the page cache. With perf stat I get the expected result:

perf stat -e major-faults,minor-faults ./mmap_test /dev/shm/rand1g populate
Press enter to exit...

 Performance counter stats for './mmap_test /dev/shm/rand1g populate':

                 0      major-faults                                                
                45      minor-faults                                                

       1.323418217 seconds time elapsed

The 45 faults just come from the runtime and process overhead (and don't depend on the size of the file mapped).

However, /usr/bin/time reports ~15,300 minor faults:

 /usr/bin/time ./mmap_test /dev/shm/rand1g populate
Press enter to exit...

0.05user 0.05system 0:00.54elapsed 20%CPU (0avgtext+0avgdata 977744maxresident)k
0inputs+0outputs (0major+15318minor)pagefaults 0swaps

The same ~15,300 minor faults is reported by top and by examining /proc/<pid>/stat.

Now if you don't use MAP_POPULATE, all the methods, including perf stat agree there are ~15,300 page faults. For what it's worth, this number comes from 1,000,000,000 / 4096 / 16 = ~15,250 - that is, 1GB divided in 4K pages, with an additional factor of 16 reduction coming from a kernel feature ("faultaround") which faults in up to 15 nearby pages that are already present in the page cache when a fault is taken.

Who is right here? Based on the documented behavior of MAP_POPULATE, the figure returned by perf stat is the correct one - the single mmap call has already populated the page tables for the entire mapping, so there should be no more minor faults when touching it.


1Actually, any string other than populate works as nopopulate.

BeeOnRope
  • 60,350
  • 16
  • 207
  • 386
  • What are you main concerns with the faults? A [page-fault](https://en.wikipedia.org/wiki/Page_fault) isn't a bad thing, it's house keeping for the kernel and hardware.. – txtechhelp Jan 02 '17 at 18:12
  • 4
    Curiosity. Especially when core tools like `perf` and `top` report numbers different by more than two orders of magnitude! Thanks for the wikipedia article, but I understand very well what a page fault is :) – BeeOnRope Jan 02 '17 at 18:14
  • What do you get when you add the `k` kernel flag modifier to `perf` .. `perf stat -e major-faults:uk,minor-faults:uk` ..? – txtechhelp Jan 02 '17 at 18:22
  • 43 minor faults in user-space, and 4 in kernel space (those don't add to 45 because the fault numbers jump around by 2-3 every time it runs, not surprisingly). Still zero major faults, ofc. – BeeOnRope Jan 02 '17 at 18:24
  • What kernel/distro are you using? – txtechhelp Jan 02 '17 at 18:40
  • @txtechhelp Ubuntu 16.04 on kernel 4.4.0. – BeeOnRope Jan 02 '17 at 18:41
  • You might have to inspect the source for the tools/kernel for this .. I've got an older kernel without the `perf` and `time` tools (can't update right now either) but I'd suspect it's something with how the tools count the faults to report based on the mapping .. you might get a better response on the [Unix](http://unix.stackexchange.com/) SE site since this isn't really a "code" error and more of a "why is the kernel doing this" kind of question .. – txtechhelp Jan 02 '17 at 19:03
  • 2
    What likely happens is the counter you are looking at is the number of populated mappings, as opposed to really page faults. I can't be arsed to check the source though. If you really want to know if a page fault happened, you can instrument do_page_fault with systemtap. –  Jan 02 '17 at 19:28

0 Answers0