3

I am trying to use the Intel PMU performance monitoring (PEBS) to sample all LOAD and STORE operations in a C/C++ application binary. The codebase I am using uses perf_event_open() to set up the monitoring for either LOAD or STORE in the attr->config field as shown in the code snippet below. I want to add another switch case to sample LOAD_AND_STORE operations. But I don't know how to config this attr->config field to the appropriate HEX value for Intel PMU like the values currently present in the code snippet for either LOAD or STORE. I would appreciate any pointers or help. Thanks in advance.

switch(aType)
    {
        case LOAD:
        {
/* comment out by Me
//          attr->config                 = 0x1cd;
#if defined PEBS_SAMPLING_L1_LOAD_MISS
            //attr->config                 = 0x5308D1; // L1 load miss
            attr->config             = 0x8d1; // perf stat -e mem_load_uops_retired.l1_miss -vvv ls  // for broadwell
#elif defined PEBS_SAMPLING_LLC_LOAD_MISS
            attr->config                 = 0x5320D1; // LLC load miss
#else 


            attr->config                 = 0x5381d0; //All Load
#endif
*/
//          attr->config                 = 0x5308D1; // L1 load miss
//          attr->config                 = 0x5320D1; // LLC load miss
//                        attr->config1                = 0x3;

// added by me
                        attr->config                 = 0x5381d0; //All Load added by me 
                        attr->precise_ip             = 3;
            load_flag = true;
            break;
        }
        case STORE:
        default:
        {
                attr->config                 = 0x5382d0;//0x2cd;
//          attr->config             = 0x8d1;   //mem_load_uops_retired.l3_miss
//              attr->config1                = 0x0;
                attr->precise_ip             = 3;
            store_flag = true;
            break;
        }
    }

        attr->read_format            = PERF_FORMAT_GROUP | PERF_FORMAT_ID;
//        attr->task                   = 1;

    // fresh creation
//  return registerDevice(sessionId);
}
Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Azad Md Abul Kalam
  • 115
  • 1
  • 1
  • 7
  • 1
    You can of course use two separate counters, one for loads and one for stores, but good question if you can count L3 misses or whatever for both loads and stores with the same counter. I wouldn't be surprised if you can't; loads access cache in the load execution unit; stores access cache only when they reach the head of the store buffer. – Peter Cordes Dec 08 '22 at 15:42
  • Thank you so much @PeterCordes. Actually I was wonder how could I set two different event, like for STORE, I set perf config to be `attr->config = 0x5381d0` then pebs watches this event, If I set event to 0x5382d0, then it watches for STORE, but Is there any hex value for event type LOAD_AND_STORE? Thanks gain BTW. – Azad Md Abul Kalam Dec 14 '22 at 23:59
  • Yeah, I realize that's what you're asking, I was just suggesting a workaround. I upvoted the question so maybe someone who knows for sure that there isn't a LOAD_AND_STORE event type can notice and answer it. – Peter Cordes Dec 15 '22 at 00:24
  • You can count all L3 cache misses with `LONGEST_LAT_CACHE.MISS`; see [definition of linux perf cache-misses event?](https://stackoverflow.com/q/60009988). That counts by cache line, not by instruction, and includes at least some HW prefetch. (Including code fetch, loads, RFOs from stores, and presumably page walks.) But `perf list` doesn't mention it as a Precise event. IDK if that means it can't use PEBS, or if it just isn't associated with a specific instruction. – Peter Cordes Dec 21 '22 at 22:06

1 Answers1

2

Yes, there is a way to measure all "LOAD_AND_STORE" instructions using the PEBS facility.

The raw event you are looking for MEM_INST_RETIRED.ANY. The specification for this event for Skylake microarchitecture is defined here.

The umask for this event is 0x83 and the event code is 0xD0. So the resultant perf event config that you are looking for is attr->config = 0x5383d0.

Arnabjyoti Kalita
  • 2,325
  • 1
  • 18
  • 31
  • 1
    I think they're also wanting to count any load/store that misses in L3, but hopefully this is also useful for part of what they want. I just noticed the question doesn't actually ask about counting cache misses, just counting load/store instructions at all; only the code block hints at counting cache misses. – Peter Cordes Dec 24 '22 at 22:05
  • 1
    Thanks a lot! @Arnabjyoti Kalita I'm going to check your suggestion. – Azad Md Abul Kalam Dec 24 '22 at 23:30
  • 1
    This works! Thanks again! Kalita, Thanks for upvoting @Peter Cordes – Azad Md Abul Kalam Dec 25 '22 at 00:09