1

I ran vtune on a xeon phi core and obtained a CPI of 0.777 for a single threaded benchmark. However, this seems really unlikely to be true because the theoretical maximum CPI is 1.0 for a single thread. (search for "Theoretical CPI" on https://software.intel.com/en-us/articles/optimization-and-performance-tuning-for-intel-xeon-phi-coprocessors-part-2-understanding)

I verified that no other threads are running by checking the vtune thread information.

  1. VTune CPI information :

Function / Call Stack Clockticks Instructions Retired CPI Rate Retiring Bad Speculation Back-End Bound Front-End Bound Module Function (Full) Source File Start Address centered_3d 259,622,095,647 334,057,786,295 0.777 0.316 0.000 0.719 0.004 ef-test centered_3d ef_operator.c 0x420703

The CPI is 0.777 from the above information.

  1. VTune Thread information for the function centered_3d:

239.616s -- Simultaneously utilized logical CPUs = 0

163.632s -- Simultaneously utilized logical CPUs = 1

Does the above information imply that vtune is doing some calculations incorrectly ? Eg : Is it possible that it does not count the number of cycles or number of instructions correctly ?

  • That does look a little strange. Did you look at the INSTRUCTIONS_EXECUTED_V_PIPE counter? V pipe instructions count as an instruction but they sneak in alongside U pipe instructions. I don't think it would make that much of a difference, but still, it might be worth looking at. – froth Aug 29 '15 at 01:00

0 Answers0