I am trying to use a custom matvec operator with PETSc MatShell in Fortran and inside of it, I want to use a mix of OpenMP and MKL multithreading (blas).
The OpenMP and MKL threads are indeed launched, but htop shows that only the OpenMP threads occupy 200% of the CPU (2 threads at 100%) when there are 48 cores available.
I can indeed see the rest of the threads (MKL) on htop but they use 0% of CPU.
How can I achieve this?
Edit: I'm glad to post more details. I'm shooting for a short message first in case someone has run into the same issue.