I'm going to set up a Linux server (probably CentOS) in a computer science department. The server will be used as a compute server, by people doing research on GPU computing, bioinformatics, or AI.
Hypothetically I could just give a shell to each user and let them launch their jobs, and probably that's just what I'll do at the beginning.
However, I'm faced with a potential problem: sometimes the machine will be used as a computing facility with the aim of just getting the computation results, while sometimes it will be used as a benchmarking platform, in order to measure the efficiency of new techniques/algorithms/whatever.
This means that, while the server is being used for a task of the second kind, other users should not be able to launch other heavy tasks, interfering with the benchmarking results.
So I'd like to setup and possibly automate a system of the likes:
- Typically, users have no resource limits, and different jobs are scheduled and share the system's resources normally.
- If a user launches a "priority" job, other users are put into a
restricted
cgroup
, limited to only one or two of the available CPUs, and with a restricted limit of memory usage. - The priority job is launched on a separate
cgroup
that has access to all the other CPUs and has no limit on memory usage
Is there some software package that helps automating such an architecture? Everything I find on the internet talks about orchestrating containers, but the difference here is that I want to restrict resources used by others while my job is running, so lunching the job in a container does not help.
I've also looked at something like dockersh
, to implement the reverse: everybody directly log-in inside a container, so I can easily allocate resources to each on-demand. But, dockersh
seems unmaintained, and I didn't find anything else that implements the same concept.