nvprof supports both events (raw counters) and metrics. These can be queried using the following commands:
nvprof --query-events
nvprof --query-metrics
CC5./6. Local Memory Metircs
- local_load_transactions_per_request: Average number of local memory load transactions performed for each local memory load
- local_store_transactions_per_request: Average number of local memory store transactions performed for each local memory store
- local_load_transactions: Number of local memory load transactions
- local_store_transactions: Number of local memory store transactions
- local_hit_rate: Hit rate for local loads and stores
- local_memory_overhead: Ratio of local memory traffic to total memory traffic between the L1 and L2 caches expressed as percentage
- local_load_throughput: Local memory load throughput
- local_store_throughput: Local memory store throughput
- inst_executed_local_loads: Warp level instructions for local loads
- inst_executed_local_stores: Warp level instructions for local stores
- l2_local_load_bytes: Bytes read from L2 for misses in Unified Cache for local loads
- l2_local_global_store_bytes: Bytes written to L2 from Unified Cache for local and global stores. This does not include global atomics.
- local_load_requests: Total number of local load requests from Multiprocessor
- local_store_requests: Total number of local store requests from Multiprocessor
local__request is the number of instructions executed to local memory via generic address space or local address space. On CC5./6.* I do not recall if this includes fully predicated of instructions.
local_*_transactions is the number of cache accesses that occurred due to the size (32-bit, 64-bit, ...) of the request and the address divergence of the request. If this is non-zero then local memory was accessed.
l2_local_*_bytes is the number of bytes of data loaded/stored to the L2 cache.