I am writing a library for some scientific computing tasks, where the core computational routines are written in c++ and pybind11 is used to expose them to the python side of the library.
How can I profile my c++ code to improve the performance. In particular, how can I use intel-vtune profiler with my python scripts that call the c++ functions.