Containers for HPC batch processing

Question

We are facing the problem that a lot of people want to run different scientific software on our high performance computing cluster. Every user requires a different set of libraries and library versions and we do not want the administrator to deal with the installation of a new library every time.

So we are thinking about using docker containers for this purpose: Every user can setup his own container with the userland libraries that he requires and then run the batch processing jobs using this container.

However, as I see it, docker is mainly focused on services instead of batch processing jobs: usually you have a (e.g. web) service that is suppose to run all the time and process new jobs (which is basically always the same task with new input data) as soon as they come in.

Our situation is quite different: a new user should be able to setup new tasks that should run on the hardware and should just get a certain amount of resources for his batch processing job.

I am thus wondering if there is already a solution for this scenario. I had a look at https://github.com/NERSC/shifter which seems to go into the right direction, but development has stalled.

You might want to have a look at [Singularity](http://singularity.lbl.gov/) which is under active development and uses the container approach but without the need of having daemons running. As far as I know, you also can directly do `mpirun -np `. — Thomas, Feb 13 '17 at 15:44

score 3 · Accepted Answer · answered Feb 13 '17 at 14:37

We use docker containers extensively for ephemeral batch type jobs. In our case, it's intensive 3D imaging processing, but each container processes a "batch" of thousands of related images. We've found this use case to work very well, there's no reason to not use docker for it.

Here are a few things to think about when designing your solution:

Are all of the people submitting code trusted? If not, you'll need to have a long think about security.
Ensure that you run your containers with the -rm flag so that containers get automatically removed upon completion.
Run a local docker registry so that 1) you're not dependent on an external registry and 2) you can configure your batch server to automatically pull images as needed.
Keep track of images that haven't been used in some time and purge them from the server.

score 0 · Answer 2 · answered Feb 21 '17 at 23:21

ProActive from ActiveEon is a batch scheduler originally designed for HPC clusters. It includes a feature to launch tasks within containers. This article goes through a demo with R packages within a Docker container.

Regarding the design questions:

The workflows can be stored in a catalog with RBAC. It is possible to create a secured process to add them in it.
The containers are automatically removed once the task is executed.
As needed.
It is possible to build a workflow to regularly track and remove unused images from the diverse resources.

Finally, an additional feature is the ability to burst in the cloud (public or private) if more capacity is required.

Containers for HPC batch processing

2 Answers2