Use this tag to ask questions about Intel® VTune™ Profiler, which is an advanced performance profiler to find and optimize performance bottlenecks across CPU, GPU, and FPGA systems.
Questions tagged [intel-vtune]
182 questions
0
votes
1 answer
How does thread waiting affect the execution time of the program?
In my C++ program, I am using boost libraries for parallel programming. Several threads are made to join() on other threads in a part of the program.
The program runs pretty slow for some inputs... In an attempt to improve my program, I tried…

progammer
- 1,951
- 11
- 28
- 50
0
votes
1 answer
How to collect hardware events of ArangoDB with profiling tool
On a Ubuntu server 14.04 (4.4.0-62-generic) on Intel Xeon CPU E5-2698 v4,
I am trying to collect hardware event counts for ArangoDB with Intel VTune.
But if I start collecting, the server will die right away.
I think the reason is that ArangoDB is…

kbright0912
- 1
- 1
0
votes
1 answer
Can I still profile my code when the load exceeds the cores?
Sometimes I need to profile an application while simultaneously needing to fire off a large number of unrelated calculations. Often I will launch off multiple jobs so that the load exceeds the number of cores so that I can just come back sometime…

EMiller
- 2,792
- 4
- 34
- 55
0
votes
0 answers
Intel VTune CPU OpenCL Command Queue
I can view the Intel HD Graphics Command Queue with VTune, but I cannot the CPU Command Queue. Why? It is the expected behavior, to only capture GPU "events" but not those from the CPU that are independent of the GPU?
The same OpenCL program (a…

user3819881
- 377
- 3
- 13
0
votes
0 answers
MPI4py profiling with VTune
I have an MPI python application and I try to profile it using VTune. Since I am running my application on a HPC, I am obliged to use a terminal. I tried several times and I am getting the following error:
amplxe: Error: Failed to attach to the…

neiron21
- 71
- 5
0
votes
1 answer
How should I interpreter these VTune results?
I'm trying to parallelyzing this code using OpenMP. OpenCV (built using IPP for best efficiency) is used as external library.
I'm having problems unbalanced CPU usage in parallel fors, but it seems that there is no load imbalance. As you will see,…

justHelloWorld
- 6,478
- 8
- 58
- 138
0
votes
2 answers
Difficulties in understand assmbly code of '__atomic_compare_exchange'
I program in C++ and use CAS operation for thread synchronization.
I profiled my program by using Vtune and found that a huge portion of time was spent on CAS operation.
I took a look at the assembly code.
The profiling result shows that the…

syko
- 3,477
- 5
- 28
- 51
0
votes
0 answers
Cannot locate debugging symbols and a lot of idle CPU usage
I'm new to VTune Amplifier and I'm trying to profile OpenCV with a very basic application. Following this guide on recommended compiler options, I compiled OpenCV via CMake with CMAKE_BUILD_TYPE=RelWithDebInfo and -DWITH_OPENMP=ON so both -O2 and -g…

justHelloWorld
- 6,478
- 8
- 58
- 138
0
votes
1 answer
Error in comparing two Intel VTune Amplfier analysis?
I'm following this video tutorial (from Linux) about VTune Amplifier and I've followed everything, but when he compares the two basic analysis there is this error:
How can I solve this?

justHelloWorld
- 6,478
- 8
- 58
- 138
0
votes
0 answers
Intel VTune Results Understanding - Naive Questions
My application I want to speedup performs element-wise processing of large array (about 1e8 elements).
The processing procedure for each element is very simple and I suspect that bottleneck could be not CPU but DRAM bandwidth.
So I decided to…

user2351152
- 31
- 3
0
votes
1 answer
Multi-threaded performance issues
I have a multi-threaded programs. We use an own implementation of the thread pool. First, the load of the project is enough. compred to single thread, the program of two threads is more faster.
When we increase the number of threads, greater than 2,…

ballontt
- 11
- 1
0
votes
1 answer
Profiling OpenCL application on Windows with NVIDIA GPU
can you help me?
I'm developing OpenCL application on windows 7 x64. Hardware is Intel Core i5, NVIDIA GTX 770. OpenCL uses NVIDIA for acceleration.
If I'm trying to use Intel VTune Amplifier XE 2015 my application hangs on the end of profiling and…

Mike
- 43
- 1
- 5
0
votes
2 answers
system profiling - usage information of shared libraries
Is there any way to know which library files are being used by which process (or by how many number of process) in some amount of time.
Can V-Tune or perf or OProfile be used for this?

Arjun Bora
- 439
- 2
- 8
- 20
0
votes
1 answer
How to measure Windows API code coverage of app level benchmarks
My job involves system-level performance testing with third party tools that I do not have sources for. I'm also testing Windows, and can use debugging symbols but not Windows source code. I'd like a quantitative way to describe the areas of the…

Aaron Altman
- 1,705
- 1
- 14
- 22
0
votes
1 answer
Profiling with Intel Vtune Amplifier
I have create one filter dll using some static libs and this dll is used in graph studio and it's running fine. But I have to do profiling of my dll, so I have started graph studio then vtune. In vtune project property I have attached it to process…

Mohan
- 1,871
- 21
- 34