I'm aware that there are hardware cache events and there are also some events that give the number of large latency memory requests (gt 16, 32, 64, 128 cycles).
I wanted to know if it makes sense to use any such metric to estimate/guess the row-buffer locality, or the impact of lack of row-buffer locality. In other words, can I show that a given program shows good row-buffer locality as compared to another program (same memory controller)? Also, listing such events would be really helpful.