Use this tag to ask questions about Intel® VTune™ Profiler, which is an advanced performance profiler to find and optimize performance bottlenecks across CPU, GPU, and FPGA systems.
Questions tagged [intel-vtune]
182 questions
1
vote
2 answers
Seeking maximum bitmap (aka bit array) performance with C/Intel assembly
Following on from my two previous questions, How to improve memory performance/data locality of 64-bit C/intel assembly program and Using C/Intel assembly, what is the fastest way to test if a 128-byte memory block contains all zeros?, I have…

eyepopslikeamosquito
- 185
- 1
- 7
1
vote
2 answers
How to improve memory performance/data locality of 64-bit C/intel assembly program
I am using a hobby program to teach myself high performance computing techniques.
My PC has an Intel Ivy Bridge Core i7 3770 processor with 32 GB of memory and the free version of Microsoft vs2010 C compiler.
The 64-bit program needs about 20 GB of…

eyepopslikeamosquito
- 185
- 1
- 7
1
vote
1 answer
Intel Vtune weird behavior for fortran code on linux
I have compiled the fortran program with different optimization flags. I have one program with default optimization flag -O2 and another compiled program with -fast optimization flag. I was able to open the program compiled with optimization flag…

Jd Baba
- 5,948
- 18
- 62
- 96
1
vote
1 answer
Running Intel's VTune Periodically
In previous versions of VTune, there was a program called dsep.exe, which could be used to periodically poll hardware counters (specifically related to DRAM reads/writes) from VTune. This allowed me to gather counter data about each instance in…

Shookit
- 1,162
- 2
- 13
- 29
1
vote
1 answer
Effects of Loop unrolling on memory bound data
I have been working with a piece of code which is intensively memory bound. I am trying to optimize it within a single core by manually implementing cache blocking, sw prefetching, loop unrolling etc. Even though cache blocking gives significant…

Anusuya
- 11
- 1
0
votes
1 answer
OpenMP, VTune, idle threads
I use VTune to check concurrency of my code. Here is the screen-shot of the output. You can see, that there is some initial period with 1 thread, then ~0.3 sec of intensive multi-thread work (brown spikes) and then almost 3 seconds of idle (no brown…

Jakub M.
- 32,471
- 48
- 110
- 179
0
votes
0 answers
x86: movsxd taking a long time on Intel's Cascade Lake machine (Core i9-10980XE)
Upon using Intel's Vtune tool, I notice that the
movsxd rax, edx
instruction is taking quite some time to execute. I understand that we access both 32 bit and 64 bit registers in this assembly code but is it expected to take a long time to…

Vignesh
- 1
- 2
0
votes
0 answers
profiling simple python script with VTune in ubuntu
I am trying to profile with VTune a simple python script.
import numpy as np
def my_function():
res = 0
for i in range(100000000):
res = res + i
final_res = np.log(res)
return final_res
def my_function2():
res = 0
…

chris sidi
- 1
- 1
0
votes
1 answer
How to use Intel VTunes to detect where does UPI flows comes from?
In memory access evaluation module, there exists a panel called platform diafram, which shows the UPI Utilization outgoing. I wish to ask that whetther it is possiable for me to find out where thos UPI flows comes from?
It would be so great if I…

Shadow_visual
- 19
- 2
0
votes
1 answer
How to write a simple VTune wrapper script on Windows?
Question
How do I write a wrapper script for VTune for Windows?
The documentation provides a simple wrapper script example for bash:
#!/bin/bash
# Prefix script
echo "Target process PID: $VTUNE_TARGET_PID"
# Run VTune collector
"$@"
# Postfix…

Samufi
- 2,465
- 3
- 19
- 43
0
votes
0 answers
Can we tune the sampling precision of intel vtune to get the exactly delay of each instruction?
When I using intel vtune to profile a application with memory access mode, some instructions have huge delay in my results, which is shown below.
vtune result shows huge delay of some instructions
Obviously, these two register sub instructions will…

zongwu wang
- 11
- 2
0
votes
1 answer
What to do when the program executes too fast for hotspots to be found. (Intel vTune Profiler)
I am trying to profile a c application to find hotspots in the code. However, I have an issue where the program completes too fast for vTune to properly collect the data.
I can not change the original program in anyway so trying to make it take…

Ajay Varghese
- 89
- 1
- 1
- 5
0
votes
0 answers
Vtune profile result show a significant portion of CPU time is spinning
I am using Vtune Threading mode to profile my project which running on windows 10. At Top-down view, I found 39.1% cpu time spend on spinning, callstack show in below picture.
But in Platform View, I find the spinning callstack only cost 40us per…

charlesJKing
- 21
- 3
0
votes
1 answer
Internal Error & Collection Failed while doing HW event based analysis with VTune
I have a CentOS 7 machine with Ubuntu 18.04 trying to collect hotspots for a particular application and getting error. How to fix this error.
$ /opt/intel/oneapi/vtune/2022.1.0/bin64/vtune -collect hotspots -knob sampling-mode=hw -knob…

Aishwarya
- 29
- 4
0
votes
1 answer
Getting Error When running VTune GUI on an Ubuntu Virtual Machine
I am using Intel VTune Profiler 2022.1.0 on Ubuntu 18.04.
I running VTune GUI successfully on an Ubuntu VM using VMWare Fusion on Mac OS X Monterey.
Then, I installed the kernel debug symbol packages (linux-image-5.4.0-107-generic-dbgsym) using…

Aishwarya
- 29
- 4