How many minor faults is my process really taking?

Question

I have the following simple program, which basically just mmaps a file and sums every byte in it:

#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <sys/mman.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include <string.h>

volatile uint64_t sink;

int main(int argc, char** argv) {

  if (argc < 3) {
    puts("Usage: mmap_test FILE populate|nopopulate");
    return EXIT_FAILURE;
  }

  const char *filename = argv[1];
  int populate = !strcmp(argv[2], "populate");
  uint8_t *memblock;
  int fd;
  struct stat sb;

  fd = open(filename, O_RDONLY);
  fstat(fd, &sb);
  uint64_t size = sb.st_size;

  memblock = mmap(NULL, size, PROT_READ, MAP_SHARED | (populate ? MAP_POPULATE : 0), fd, 0);

  if (memblock == MAP_FAILED) {
    perror("mmap failed");
    return EXIT_FAILURE;
  }

  //printf("Opened %s of size %lu bytes\n", filename, size);  

  uint64_t i;
  uint8_t result = 0;
  for (i = 0; i < size; i++) {
    result += memblock[i];
  }

  sink = result;

  puts("Press enter to exit...");
  getchar();

  return EXIT_SUCCESS;
}

I make it like this:

gcc -O2 -std=gnu99     mmap_test.c   -o mmap_test

You pass it a file name and either populate or nopopulate¹, which controls whether MAP_POPULATE is passed to mmap or not. It waits for you to type enter before exiting (giving you time to check out stuff in /proc/<pid> or whatever).

I use a 1GB test file of random data, but you can really use anything:

dd bs=1MB count=1000 if=/dev/urandom of=/dev/shm/rand1g

When MAP_POPULATE is used, I expect zero major faults and a small number of page faults for a file in the page cache. With perf stat I get the expected result:

perf stat -e major-faults,minor-faults ./mmap_test /dev/shm/rand1g populate
Press enter to exit...

 Performance counter stats for './mmap_test /dev/shm/rand1g populate':

                 0      major-faults                                                
                45      minor-faults                                                

       1.323418217 seconds time elapsed

The 45 faults just come from the runtime and process overhead (and don't depend on the size of the file mapped).

However, /usr/bin/time reports ~15,300 minor faults:

 /usr/bin/time ./mmap_test /dev/shm/rand1g populate
Press enter to exit...

0.05user 0.05system 0:00.54elapsed 20%CPU (0avgtext+0avgdata 977744maxresident)k
0inputs+0outputs (0major+15318minor)pagefaults 0swaps

The same ~15,300 minor faults is reported by top and by examining /proc/<pid>/stat.

Now if you don't use MAP_POPULATE, all the methods, including perf stat agree there are ~15,300 page faults. For what it's worth, this number comes from 1,000,000,000 / 4096 / 16 = ~15,250 - that is, 1GB divided in 4K pages, with an additional factor of 16 reduction coming from a kernel feature ("faultaround") which faults in up to 15 nearby pages that are already present in the page cache when a fault is taken.

Who is right here? Based on the documented behavior of MAP_POPULATE, the figure returned by perf stat is the correct one - the single mmap call has already populated the page tables for the entire mapping, so there should be no more minor faults when touching it.

¹Actually, any string other than populate works as nopopulate.

What are you main concerns with the faults? A [page-fault](https://en.wikipedia.org/wiki/Page_fault) isn't a bad thing, it's house keeping for the kernel and hardware.. — txtechhelp, Jan 02 '17 at 18:12
Curiosity. Especially when core tools like `perf` and `top` report numbers different by more than two orders of magnitude! Thanks for the wikipedia article, but I understand very well what a page fault is :) — BeeOnRope, Jan 02 '17 at 18:14
What do you get when you add the `k` kernel flag modifier to `perf` .. `perf stat -e major-faults:uk,minor-faults:uk` ..? — txtechhelp, Jan 02 '17 at 18:22
43 minor faults in user-space, and 4 in kernel space (those don't add to 45 because the fault numbers jump around by 2-3 every time it runs, not surprisingly). Still zero major faults, ofc. — BeeOnRope, Jan 02 '17 at 18:24
You might have to inspect the source for the tools/kernel for this .. I've got an older kernel without the `perf` and `time` tools (can't update right now either) but I'd suspect it's something with how the tools count the faults to report based on the mapping .. you might get a better response on the [Unix](http://unix.stackexchange.com/) SE site since this isn't really a "code" error and more of a "why is the kernel doing this" kind of question .. — txtechhelp, Jan 02 '17 at 19:03
What likely happens is the counter you are looking at is the number of populated mappings, as opposed to really page faults. I can't be arsed to check the source though. If you really want to know if a page fault happened, you can instrument do_page_fault with systemtap. — , Jan 02 '17 at 19:28

How many minor faults is my process *really* taking?

0 Answers0

How many minor faults is my process really taking?