I am running Linux on 32-nm Intel Westmere processor. I have a concern with seemingly conflicting data on DTLB miss numbers from performance counters. I ran two experiments with a random memory access test program (single-threaded) as follows:
Experiment (1): I counted the DTLB misses using following performance counter
DTLB_MISSES.WALK_COMPLETED ((Event 49H, Umask 02H)
Experiment (2): I counted the DTLB misses by summing up following the two counter values below
MEM_LOAD_RETIRED.DTLB_MISS (Event CBH, Umask 80H)
MEM_STORE_RETIRED.DTLB_MISS (Event 0CH, Umask 01H)
I expected the output of these experiments to be similar. However I found that numbers reported in experiment (1) is almost twice that of in experiment (2). I am at a loss why this is the case.
Can somebody help shed some light on this apparent discrepancy?