I've made few algorithm implementations with various micro-optimalizations. I need to count number of executed instructions of a call, or between two places (before and after call).
Algorithm uses few cycles and conditional jumps, and it's data sensible. So I can't just use calculated number of instructions per cycle iteration, and multiply it with count of iterations.
Disclaimer: I know that number of executed instructions ain't much relevant, because performance for same instructions varies with different CPUs, but it's for demonstration purpose only.