Use this tag to ask questions about Intel® VTune™ Profiler, which is an advanced performance profiler to find and optimize performance bottlenecks across CPU, GPU, and FPGA systems.
Questions tagged [intel-vtune]
182 questions
4
votes
0 answers
How to monitor the utilization of cores on Xeon Phi at 10Hz?
I've been trying to measure/monitor the utilization of all those 60 cores on Xeon Phi (Knights Corner, in-order processors) at a relatively high frequency, say, at least every 0.1s which yields to 10Hz.
I tried the latest PAPI library. But it only…

thierry
- 217
- 2
- 12
4
votes
2 answers
VTune profiling shows no metrics for branch prediction on polymorphic function?
I am analyzing the different between two designs which process millions of messages. One design uses polymorphism and the other doesnt- each message will be represented by a polymorphic sub type.
I have profiled both designs using VTune. The…

user997112
- 29,025
- 43
- 182
- 361
4
votes
1 answer
Why does g++ (4.6 and 4.7) promote the result of this division to a double? Can I stop it?
I was writing some templated code to benchmark a numeric algorithm using both floats and doubles, in order to compare against a GPU implementation.
I discovered that my floating point code was slower and after investigating using Vtune Amplifier…

amckinley
- 629
- 1
- 7
- 15
3
votes
3 answers
Which compilation option should be set for profiling?
I need to profile an application compiled with intel's compiler via VC++.
I'm using VTune to profile my code.
My understanding is that in release mode I won't have the debug information
that is necessary for the profiler to profile my code while in…

fulmicoton
- 15,502
- 9
- 54
- 74
3
votes
1 answer
Paradoxical VTune Amplifier microarchitecture exploration results
I am trying to optimize a sin/cos approximation function. At its core there is a simple Horner scheme consisting of a bunch of multiplies and adds. Compiler is MSVC from VS2017, processor is Intel Xeon E5-1650, hyperthreading is on (but observations…

Max Langhof
- 23,383
- 5
- 39
- 72
3
votes
0 answers
Can't see source code in VTune with OpenCV
I am trying to profile OpenCV code using Intel's VTune.
The source code does not show up when I double click on OpenCV functions in VTune. Only the assembly will show up. Non OpenCV functions show me the source code.
When I go to the platform tab,…

Josh
- 43
- 5
3
votes
0 answers
Optimizing fortran code with intel VTune analyzer
I am working with a fortran project to simulate vegetation dynamic. The code is slow so I am always on the look for ways to optimize it.
I have been reading that there exist a "rule" saying that usually 90% of the time is spent on 10% of the code.…

Manfredo
- 1,760
- 4
- 25
- 53
3
votes
0 answers
Vtune get summary information only
I use Intel Vtune to profile a code on Xeon Phi. I use the following command:
amplxe-cl -collect knc-general-exploration ./a.out
The result is a bunch of information along with a new directory containing more information. I'm just interested in a…

arunmoezhi
- 3,082
- 6
- 35
- 54
3
votes
1 answer
Intel Assembler optimization
I'm currently trying to optimize the code emitted from a home-made compiler, for a home-made language.
I've tried out Intel VTune to see where the bottlenecks are: http://www.imada.sdu.dk/~sorenh07/misc/vtune-assembly-optimization.png
I find it very…

Søren Haagerup
- 377
- 1
- 4
- 11
3
votes
1 answer
Decrease in instructions retired after loop Unrolling
I have a O(N^4) image processing loop and after profiling it (Using Intel Vtune 2013), I see that the number of Instructions retired is reduced drastically. I need help understanding this behavior on a multicore architecture. (I'm using Intel Xeon…

quantumshiv
- 97
- 10
3
votes
2 answers
Optimizing code where "problems" are in libc
I have a C++ code and I am playing with Intel's VTune and I ran the General Exploration analysis and have no idea how to interpret the results. It flags as an issue the number of Retire Stalls.
On it's own, that is enough to confuse me because I'm…

tpg2114
- 14,112
- 6
- 42
- 57
3
votes
1 answer
VTUNE: Cannot display data
I'm using Intel Vtune to do some analyzis: memory access, access contention, etc. and I'm getting this error: Cannot display data. The data cannot be displayed: there is no viewport applicable for the data.
I'm using Debian 6, Intel Vtune Amplifier…

JohnTortugo
- 6,356
- 7
- 36
- 69
3
votes
1 answer
How to use Vtune Analyzer API on linux
I want to use Vtune Profiler APIs to profile a code running on Xeon Phi (Linux, using offload execution) to see the number of instructions executed, the number of L1 cache misses, etc. But I can't find anywhere explaining how to use this library.…

Zk1001
- 2,033
- 4
- 19
- 36
2
votes
0 answers
How to profile a C++ shared library written for python
I am writing a library for some scientific computing tasks, where the core computational routines are written in c++ and pybind11 is used to expose them to the python side of the library.
How can I profile my c++ code to improve the performance. In…

Fracton
- 171
- 5
2
votes
3 answers
Perf: Could not find an useful description of "branch-load-misses" metric
I'm trying to show that the stalls due to branch misprediction may be reduced due to a certain optimization. My intuition suggests this could be due to a reduction in the stall cycles related to loads that delay the branch outcome.
For this, I was…

Harsh Kumar
- 97
- 6