I need to run a Java application on a PBS cluster and I'm a bit unclear on how it should be behaving.
The application starts a few threads, the number of which is decided by looking at the number of cores on the node it's running on. It then starts 2 threads per core.
The best thing for me would be to request full access to a node in the PBS cluster for this thing to run on. That is, reserve all cores on a node. I haven't seen how I can do this. All I see is the ppn parameter that requests a specific number of cores per node but the nodes are heterogeneous so I don't want to specify a single ppn number, it would depend on the type of node I get.
If this is not possible, I need to understand how jobs behave when ppn is specified. I could instruct the Java application to only create X many threads but I don't think I would have any control over what cores these threads would run on. Creating 2 threads per core is a rule of thumb for us and it could happen that all threads want to run all the time and in that case I would be using 100% more CPU resources than I requested. Is my understanding correct that PBS won't enforce any limit on my process but may actually monitor it and even kill my process if it exceeds the resource usage that was specified?
TL;DR;
So to summarize:
- Can I request full access to a node (reserve all the cores on the node I get for a job)?
- If I request only some fraction of the cores on a node, will PBS kill my job if I exceed that limit?