How can I use Intel PIN to catch all loads to an array?

Question

I'm profiling an application I have written using PIN. The source code of the application uses an array - I want PIN to catch every load instruction made to the array.

Currently, I have annotated the source code of the application I am trying to profile. Every time I read from the array, I first call a function startRegionOfInterest(). Once I finish reading from the array I call another function endRegionOfInterest(). I can use PIN to easily catch calls to these two functions - whenever a load exists between these two calls I assume it's a load to the array I'm interested in.

However, this is pretty coarse grained, and so I end up classifying a lot of loads that are NOT to the array of interest as reads to the array.

Is there an easier way for me to more precisely catch all loads made to the array I'm interested in?

I assume you know the array's address and length? I don't know PIN well, but presumably it can decode an instruction to find out the effective address from its addressing mode. — Peter Cordes, Jun 29 '20 at 01:38
@PeterCordes I have the array's length, but how would I go about retrieving the address using PIN? — Farhad, Jun 29 '20 at 01:47
If it's statically allocated, there should be a symbol associated with it. If not, you'd need to find a load that loads the address somehow. If there's a specific addressing mode you know the compiler uses, you could use that. Something brittle that just happens to match a certain build could be fine for one-off usage. Perhaps a marker like a specific NOP encoding could help you find it with PIN. — Peter Cordes, Jun 29 '20 at 01:59
Sounds like you need some more detail in your question about how you your program accesses this array, and what kind of array it is. — Peter Cordes, Jun 29 '20 at 02:22

score 2 · Accepted Answer · answered Jun 29 '20 at 03:10

In your startRegionOfInterest method, you can use some kind of indicator sequence to pass the array address to your PIN tool. E.g., store a magic constant, then store the array address, something like:

volatile void *sink;

void startRegionOfInterest(void *array) {
    sink = (void *)0x48829d2f384be;
    sink = array;
}

In your PIN tool, you look for a store of the magic constant (within the startRegionOfInterest call for extra specificity, if you want), and then you know the next store is the address of the array. You can communicate the length similarly.

Implementing the sequence with inline asm instead you can remove the variability associated with compiler and optimizer behavior, although I think the volatile approach should work in practice (although you might have to skip some intervening non-store instructions. A godbolt.

Thanks, this should work but what I ended up doing was writing a function in the program I was profiling that took the array start and end address as arguments. I then found this function in Pin by instrumenting all function calls, and read the arguments to find the start and end address of my array. — Farhad, Jun 29 '20 at 04:22

How can I use Intel PIN to catch all loads to an array?

1 Answers1