Questions tagged [intel-vtune]

Use this tag to ask questions about Intel® VTune™ Profiler, which is an advanced performance profiler to find and optimize performance bottlenecks across CPU, GPU, and FPGA systems.

182 questions
1
vote
2 answers

Seeking maximum bitmap (aka bit array) performance with C/Intel assembly

Following on from my two previous questions, How to improve memory performance/data locality of 64-bit C/intel assembly program and Using C/Intel assembly, what is the fastest way to test if a 128-byte memory block contains all zeros?, I have…
1
vote
2 answers

How to improve memory performance/data locality of 64-bit C/intel assembly program

I am using a hobby program to teach myself high performance computing techniques. My PC has an Intel Ivy Bridge Core i7 3770 processor with 32 GB of memory and the free version of Microsoft vs2010 C compiler. The 64-bit program needs about 20 GB of…
1
vote
1 answer

Intel Vtune weird behavior for fortran code on linux

I have compiled the fortran program with different optimization flags. I have one program with default optimization flag -O2 and another compiled program with -fast optimization flag. I was able to open the program compiled with optimization flag…
Jd Baba
  • 5,948
  • 18
  • 62
  • 96
1
vote
1 answer

Running Intel's VTune Periodically

In previous versions of VTune, there was a program called dsep.exe, which could be used to periodically poll hardware counters (specifically related to DRAM reads/writes) from VTune. This allowed me to gather counter data about each instance in…
Shookit
  • 1,162
  • 2
  • 13
  • 29
1
vote
1 answer

Effects of Loop unrolling on memory bound data

I have been working with a piece of code which is intensively memory bound. I am trying to optimize it within a single core by manually implementing cache blocking, sw prefetching, loop unrolling etc. Even though cache blocking gives significant…
Anusuya
  • 11
  • 1
0
votes
1 answer

OpenMP, VTune, idle threads

I use VTune to check concurrency of my code. Here is the screen-shot of the output. You can see, that there is some initial period with 1 thread, then ~0.3 sec of intensive multi-thread work (brown spikes) and then almost 3 seconds of idle (no brown…
Jakub M.
  • 32,471
  • 48
  • 110
  • 179
0
votes
0 answers

x86: movsxd taking a long time on Intel's Cascade Lake machine (Core i9-10980XE)

Upon using Intel's Vtune tool, I notice that the movsxd rax, edx instruction is taking quite some time to execute. I understand that we access both 32 bit and 64 bit registers in this assembly code but is it expected to take a long time to…
Vignesh
  • 1
  • 2
0
votes
0 answers

profiling simple python script with VTune in ubuntu

I am trying to profile with VTune a simple python script. import numpy as np def my_function(): res = 0 for i in range(100000000): res = res + i final_res = np.log(res) return final_res def my_function2(): res = 0 …
0
votes
1 answer

How to use Intel VTunes to detect where does UPI flows comes from?

In memory access evaluation module, there exists a panel called platform diafram, which shows the UPI Utilization outgoing. I wish to ask that whetther it is possiable for me to find out where thos UPI flows comes from? It would be so great if I…
0
votes
1 answer

How to write a simple VTune wrapper script on Windows?

Question How do I write a wrapper script for VTune for Windows? The documentation provides a simple wrapper script example for bash: #!/bin/bash # Prefix script echo "Target process PID: $VTUNE_TARGET_PID" # Run VTune collector "$@" # Postfix…
Samufi
  • 2,465
  • 3
  • 19
  • 43
0
votes
0 answers

Can we tune the sampling precision of intel vtune to get the exactly delay of each instruction?

When I using intel vtune to profile a application with memory access mode, some instructions have huge delay in my results, which is shown below. vtune result shows huge delay of some instructions Obviously, these two register sub instructions will…
0
votes
1 answer

What to do when the program executes too fast for hotspots to be found. (Intel vTune Profiler)

I am trying to profile a c application to find hotspots in the code. However, I have an issue where the program completes too fast for vTune to properly collect the data. I can not change the original program in anyway so trying to make it take…
Ajay Varghese
  • 89
  • 1
  • 1
  • 5
0
votes
0 answers

Vtune profile result show a significant portion of CPU time is spinning

I am using Vtune Threading mode to profile my project which running on windows 10. At Top-down view, I found 39.1% cpu time spend on spinning, callstack show in below picture. But in Platform View, I find the spinning callstack only cost 40us per…
0
votes
1 answer

Internal Error & Collection Failed while doing HW event based analysis with VTune

I have a CentOS 7 machine with Ubuntu 18.04 trying to collect hotspots for a particular application and getting error. How to fix this error. $ /opt/intel/oneapi/vtune/2022.1.0/bin64/vtune -collect hotspots -knob sampling-mode=hw -knob…
Aishwarya
  • 29
  • 4
0
votes
1 answer

Getting Error When running VTune GUI on an Ubuntu Virtual Machine

I am using Intel VTune Profiler 2022.1.0 on Ubuntu 18.04. I running VTune GUI successfully on an Ubuntu VM using VMWare Fusion on Mac OS X Monterey. Then, I installed the kernel debug symbol packages (linux-image-5.4.0-107-generic-dbgsym) using…
Aishwarya
  • 29
  • 4