Use this tag to ask questions about Intel® VTune™ Profiler, which is an advanced performance profiler to find and optimize performance bottlenecks across CPU, GPU, and FPGA systems.
Questions tagged [intel-vtune]
182 questions
2
votes
1 answer
Intel VTune Profiler shows __mulq is a computationally expensive function in a fortran code
I'm trying to perform an audit on a rather complicated multi-physics model I'm working on and have been using Intel VTune Profiler to identify expensive subroutines. The most expensive function is a function called __mulq which is not something…

Liam Magargal
- 21
- 2
2
votes
1 answer
Intel Vtune Profiler: Remote Profiling with Sudo
I'm using Intel Vtune to profile a remote application that requires sudo access on another machine. I have been able to profile remote applications on that machine before that do not require sudo access, but Intel Vtune is not working for…

MUAS
- 519
- 1
- 7
- 20
2
votes
2 answers
amplxe-sepreg.exe missing from VTune
When I try to use hardware event-based profiling in VTune (Profiler 2020), I get the error message
Cannot enable Hardware Event-based Sampling due to a problem with the driver (sep*/sepdrv*). Check that the driver is running and the driver group is…

me'
- 494
- 3
- 14
2
votes
0 answers
Why is 'add' taking so long in my application?
I'm profiling an application using Intel VTune, and there is one particular hotspot where I'm copying a __m128i member variable in the copy constructor of a C++ class.
VTune gives this breakdown:
Instruction CPU Time: Total …

Thomas Johnson
- 10,776
- 18
- 60
- 98
2
votes
3 answers
How do I generate symbol information to use with Linux version of Intel's VTune Amplifier?
I am using Intel VTune Amplifier XE 2011 to analyze the performance of my program. I want to be able to view the source code in the analysis results, and the documentation says I need to provide the symbol information. Unfortunately, it does not…

Dylan Klomparens
- 2,853
- 7
- 35
- 52
2
votes
1 answer
OpenCV build in debug mode with optimizations?
I'm trying to profile OpenCV using Intel VTune Amplifier. In this page, there is a list of compiler options suggested to obtain the best analysis.
As you can see, it's a mix of debug flags (e.g. -g) and optimization flags (e.g. -O2 or higher), so we…

justHelloWorld
- 6,478
- 8
- 58
- 138
2
votes
2 answers
Interpreting Intel VTune's Memory Bound Metric
I see the following when I run Intel VTune on my workload:
Memory Bound 50.8%
I read the Intel doc, which says (Intel doc):
Memory Bound measures a fraction of slots where pipeline could be stalled due to demand load…

Frank
- 4,341
- 8
- 41
- 57
2
votes
1 answer
Profile C++ programm based on wall clock time with intel Vtune amplifier
I've just started using intel VTune Amplifier XE and looks like by default only cpu-time measures. Is it possible to setup VTune to get results based on wall clock time (real time)?
Actually my goal is to get hotspots from disk I/O operations.

Narek Atayan
- 1,479
- 13
- 27
2
votes
1 answer
How to restrict Vtune Analysis to a specific function
I have a program whose basic structure is as below :
main() {
some malloc() allocations and file reads into these buffers
call to an assembly language routine that needs to be optimized to the maximum
write back the…

quasar66
- 555
- 4
- 14
2
votes
1 answer
difficult understanding memory address in Intel's vtune tool
In the above image, I have used vtune tool to see process's flow.
Also dumped memory for windbg.
I intend to see if that Engine.dll+840c1 disassembled section in windbg, but
seems result is different.
Can you guys tell what I'm doing wrong??

백경훈
- 101
- 4
2
votes
3 answers
vtune - no symbols available
I have used vtune several times in the past, usually without too much trouble. Unfortunately the gaps between each use are often so long that I forget some aspects of how to use it each time. I know that the line number and symbols information needs…

Mick
- 8,284
- 22
- 81
- 173
2
votes
1 answer
Vtune total time in MKL function
I am working on a university project that asks me to give a breakdown on some tridiagonal eigensolvers implemented in MKL (11.1.). So I implemented some testbed for that and now, I am trying to profile this in vtune (Intel VTune Amplifier XE 2013…

yomar
- 21
- 2
2
votes
1 answer
Performance Measurement - Get Average call time per function. Intel Vtune Amplifier
I'm simply trying to get the average time it takes each function to run.
That means I want the:
"Total time inside the function" / "Number of calls to the function"
I'm getting all sorts of information when I run an analysis from within VTune.
These…

ZivS
- 2,094
- 2
- 27
- 48
2
votes
1 answer
FLOP measurement
I'm trying to estimate FLOPS for my application using intel vtune Amplifier and I'm using this post here as a guideline : https://software.intel.com/en-us/articles/estimating-flops-using-event-based-sampling-ebs/
The problem is that I can't find the…

M_rr113
- 29
- 3
2
votes
1 answer
start_thread clone taking most of the time in parallel program - bad parallelization or wrong report?
I'm currently working on parallelizing a C++ program in order to improve its performance on multi-core systems. Using OpenMP and considering the challenges (thread synchronization, data accesses, etc) we finally found a way to make the entire…

leosh
- 878
- 9
- 22