2

I am writing a library for some scientific computing tasks, where the core computational routines are written in c++ and pybind11 is used to expose them to the python side of the library.

How can I profile my c++ code to improve the performance. In particular, how can I use intel-vtune profiler with my python scripts that call the c++ functions.

Fracton
  • 171
  • 5
  • I have used `vtune-gui` with my binary but I get the error that no data was collected in the result. Though I see in my terminal that the wrapper script was executed successfully. – Fracton Dec 19 '22 at 05:22
  • 1
    Python is not magical. The interpreter is just a normal C program. Can you use vtune to profile a hello world C program? – n. m. could be an AI Dec 19 '22 at 05:47
  • I didn't get your question ? – Fracton Dec 19 '22 at 06:08
  • Write a very simple program in C. Profile it with vtune. What happens? Do hou get any errors? – n. m. could be an AI Dec 19 '22 at 06:12
  • Ok, I don't have any issues with that. I actually compile my C++ code as a shared object, and within the c++ code I use pybind11 to expose the c++ function to python. Now, the only way I see to test them is through python scripts that call them. vtune does allow one to use wrapper script with the binary. When I use it I see that no data is collected. – Fracton Dec 19 '22 at 06:17
  • It isn't quite clear what exactly "use wrapper script with the binary" should mean. Your simple program is an executable and you can profile it. The Python interpreter is another executable and you should be able to profile it the same exact way. If the interpreter loads a shared library somehow (via a script and pybind11, or by any other means) you should see profile data for that library. – n. m. could be an AI Dec 19 '22 at 06:26
  • I hear vtune supports profiling Python scripts and this might be what you have tried to do. I am not familiar with this functionality. – n. m. could be an AI Dec 19 '22 at 06:36
  • Have you compiled your shared library with debug symbols (`-g` for gcc or clang)? They are usually necessary to get any meaningful profiling results. Note that you should nevertheless compile with optimizations enabled. – Sedenion Dec 19 '22 at 06:48
  • 3
    Not sure what the problem is. I just made a simple pybind11 module and profiled it in hotspot mode with vtune, twice. Once by profiling the python executable and once by profiling the script. Either way the data was collected just fine. – n. m. could be an AI Dec 19 '22 at 07:43
  • Presumably the bit you're interested in profiling is your "core computational routines", not the marshalling done by pybind. In which case, why involve Python in it at all? Write performance tests in C++ and profile those. – Dan Mašek Dec 19 '22 at 11:50
  • @DanMašek I get what you are saying, but writing performance tests in python is much easier. – Fracton Dec 20 '22 at 07:15
  • @n.m. Could you may be write post your example as an answer? – Fracton Dec 20 '22 at 07:16
  • Just copy the first toy example from the pybind11 site. It is literally it. – n. m. could be an AI Dec 20 '22 at 12:19
  • Hi @Fracton , We could reproduce your issue. Hence, we have forwarded this issue to the respective team. Apologize for the inconvenience caused! – AlekhyaV - Intel Jan 10 '23 at 06:08

0 Answers0