0

I am running 60 MPI processes and MKL_THREAD_NUM is set to 4 to get me to the full 240 hardware threads on the Xeon Phi. My code is running but I want to make sure that MKL is actually using 4 threads. What is the best way to check this with the limited Xeon Phi linux kernel?

ADF
  • 1
  • 2
  • [Possible duplicate](http://superuser.com/questions/80556/how-do-you-view-all-threads-running-on-linux). The first answer mentions `ps -eLf` showing all processes and threads. See examples at [man ps](http://man7.org/linux/man-pages/man1/ps.1.html). – Kenney Nov 20 '15 at 20:16
  • 1
    Also, just so you know, 240 threads are not necessarily optimal. Several benchmarks show 120 threads being optimum. Although the Xeon Phi (KNC) can handle 4 threads / core, it can only sustain 2 fused-multiply-adds / core / cycle; The 4-way SMT per core exists in the hopes that while one currently "hot" thread is stalled, one of the two standby ones can execute. However, if the two "hot" threads are already well-tuned, then the standby threads may damage performance by evicting from cache data that would have otherwise stayed, leading to an overall slowdown for all four threads. – Iwillnotexist Idonotexist Nov 20 '15 at 21:35

2 Answers2

1

You can set MKL_NUM_THREADS to 4 if you like. However,using every single thread does not necessarily give the best performance. In some cases, the MKL library knows things about the algorithm that mean fewer threads is better. In these cases, the library routines can choose to use fewer threads. You should only use 60 MPI ranks if you have 61 coresIf you are going to use that many MPI ranks, you will want to set the I_MPI_PIN_DOMAIN environment variable to "core". Remember to leave one core free for the OS and system level processes. This will put one rank per core on the coprocessor and allow all the OpenMP threads for each MPI process to reside on the same core, giving you better cache behavior. If you do this, you can also use micsmc in gui mode on the host processor to continuously monitor the activity on all the cores. With one MPI processor per core, you can see how much of the time all threads on a core are being used.

froth
  • 319
  • 1
  • 6
0

Set MKL_NUM_THREADS to 4. You can use environment variable or runtime call. This value will be respected so there is nothing to check.

Linux kernel on KNC is not stripped down so I don't know why you think that's a limitation. You should not use any system calls for this anyways though.

Jeff Hammond
  • 5,374
  • 3
  • 28
  • 45