I am working on a university project that asks me to give a breakdown on some tridiagonal eigensolvers implemented in MKL (11.1.). So I implemented some testbed for that and now, I am trying to profile this in vtune (Intel VTune Amplifier XE 2013 Update 16). I need to find the bottlenecks, i.e. in which part of the code (MKL, not mine) and in which functions called by the eigensolver am I spending the most time.
To do that I was hoping to get the total time spent in each function and its callees. However, all I am getting is the self-time of each function.
My code was compiled with icc 14.0/3.174, where I tried both, linking MKL statically and dynamically.
I do hope I am not overlooking something stupid here. I am also very open to other suggestions on how to find the required values.