Suppose I have a single c/c++ app running on the host. there are few threads running on the host CPU and 50 threads running on the Xeon Phi cores.
How can I make sure that each of these 50 runs on its own Xeon Phi core and is never purged off the core cache (given the code is small enough).
Could someone please to outline a very general idea how to do this and which tool/API would be more suitable (for C/C++ code) ?
What is the fastest way to exchange data between the host thread-aggregator and the 50 Phi threads?
Given that the actual parallelism will be very limited - this application is going to be more like 51 thread plane application with some basic multithreading data sync.
Can I use conventional C/C++ compiler to create the app like this?