0

I'm looking for ways to optimize the build time of our singularity HPC containers. I know that I can save some time by building them layer by layer. But still, there is room for optimization.

What I'm interested in is using/caching whatever makes sense on the host system.

  1. CCache for C++ build artifact caching
  2. git repo cloning
  3. APT package downloads

I did some experiments but haven't suceeded in any point.

What I found so far:

CCache

I install ccache in the container and instruct the build system to use it. I know that because I'm running singularity build with sudo, the cache would be under /root. But after running the build, /root/.ccache is empty. I verified the generated CMake build files, and they definitely use ccache.

I even created a test recipe containing a %post

touch "$HOME/.ccache/test"

but the test file did not appear anywhere on the host system (not in /root and not in my user's home). Does the build step mount a container-backed directory to /root instead of the host's root dir?

Is there something more needed to be done to utilize ccache?

Git

People suggest running e.g. git-cache-http-server (https://stackoverflow.com/a/43643622/1076564) and using git config --global url."http://gitcache:1234/".insteadOf https://.

Since singularity can read parts of the host filesystem, I think there could even be a way to have it working without a proxy program. However, if the host git repos are not inside $HOME or /tmp, how can singularity access them during build? singularity build has no --bind flag to specify additional mount directories. And using the %files section in recipe sounds inefficient - to copy everything each time the build is run.

APT

People suggest to use e.g. squid-deb-proxy (https://gist.github.com/dergachev/8441335). Again, since singularity is able to read host filesystem files, I'd like to just utilize the host's /var/cache/apt. But /var is not mounted to the container by default. So the same question again - how do I mount /var/cache/apt during container build time. And is it a good idea overall? Wouldn't it damage the APT cache of the host, given both host and container are based on the same version of Ubuntu and architecture?

Or does singularity do some clever APT caching itself? I've just noticed it downloaded 420 MB of packages in 25 seconds, which is possible on my connection, but not very probable given the standard speed of ubuntu mirrors.


Edit: I've created an issue on singularity repo: https://github.com/hpcng/singularity/issues/5352 .

Martin Pecka
  • 2,953
  • 1
  • 31
  • 40

3 Answers3

4

As far as I know, there is no mechanism of caching the singularity build when building from a definition file. You can cache the download of the base image, but that's it.

There is a GitHub issue about this, where one of the main developers of Singularity gave the following reply:

You can build a Singularity container from an existing container on disk. So you could build your base container and save it and then modify the def file to build from the existing container to save time while you prototype.

But since Singularity does not create layers there is really no way to implement this as Docker does.

One point about your question:

I know that I can save some time by building them layer by layer

Singularity does not have a concept of layers, so this does not apply here. Docker uses layers, and those are cached.

The workflow I typically follow when building Singularity images is to first create a Docker image from a Dockerfile and then convert that to a Singularity image. The Docker build step has caching, so that might be useful to you.

# Build Docker image
docker build --tag my_image:latest .
# Convert to Singularity format
sudo singularity build my_image.sif docker-daemon://my_image:latest
jkr
  • 17,119
  • 2
  • 42
  • 68
  • By building layer-by-layer I actually meant what you suggested with creating base containers and building the following ones off of them. And what about setting the build environment so that I could at least utilize things like ccache? – Martin Pecka May 30 '20 at 03:06
1

This sounds like unnecessary optimization. As mentioned, you can build from a Docker image which can take advantage of some layer caching. If you plan on a lot of iteration, you can either do that to a base docker container or create the singularity image as a sandbox and write it out to a read-only SIF once it is working as you like it. If you are making frequent code changes, you can mount the source in when running the image until it is finalized.


Singularity does some caching on the host OS, by default to $HOME/.singularity/cache (generally in /root since most of the time it's a sudo singularity build ...). You can see more detail using singularity --verbose or singularity --debug. I believe this is mostly for caching images / layers from other formats, but I've not looked too in depth at it.

Building does not mount the host filesystem and is unable to be made to do so, to the best of my knowledge. This is by design for reproducibility. You could copy files (e.g, apt cache) to the image in the %files block, but that seems very hackish and ultimately questionable that it would be any faster while opening the possibility for some strange bugs.

The %post steps are built in isolation within the container and nothing is mounted in, so again it won't be able to take advantage of any caching on the host OS.

tsnowlan
  • 3,472
  • 10
  • 15
  • I hoped singularity is more "standalone"... but if the best way to get nice features in singularity is to use docker, then... :( Anyways, the docker->singularity conversion is not 100% successful, and in my containers, I know when I tried that once, it failed. Mounting local folders with code to a running container is exactly the reproducibility killer I'd like to avoid. And I'd really like to be able to rebuild the container from scratch from a recipe, so "runtime" editing is not a good way. You can imagine my containers as a kind of CI, where you want to verify the system builds. – Martin Pecka Jun 09 '20 at 14:17
  • It is standalone. If you want docker-style caching, you have to use docker. I have never run into any errors building from a docker image, and if you run into some I recommend making a github issue with the singularity folks. Mounting in code is a dev practice, not for producing final images. You shouldn't need to re-install an OS to test program changes. If your application changes the OS state, a read-only singularity image is probably not the solution for you. – tsnowlan Jun 09 '20 at 15:40
1

It shows there is a way to utilize some caches on the host. As stated by one of the singularity developers, host's /tmp is mounted during the %post phase of build. And it is not possible to mount any other directory.

So utilizing the host's caches is all about making the data accessible from /tmp.

CCache

Before running the build command, mount the ccache directory into /tmp:

sudo mkdir /tmp/ccache
sudo mount --bind /root/.ccache /tmp/ccache

Then add the following line to your recipe's %post and you're done:

export CCACHE_DIR=/tmp/ccache

I'm not sure how sharing the cache with your user and not root would work, but I assume the documentation on sharing caches could help (especially setting umask for ccache).

APT

On the host, bind the apt cache dir:

sudo mkdir /tmp/apt
sudo mount --bind /var/cache/apt /tmp/apt

In your %setup or %post, create container file /etc/apt/apt.conf.d/singularity-cache.conf with the following contents:

Dir{Cache /tmp/apt}
Dir::Cache /tmp/apt;

Git

The git-cache-http-server should work seamlessly - host ports should be accessible during build. I just did not use it in the end as it doesn't support SSH auth. Another way would be to manually clone all repos to /tmp and then clone in the build process with the --reference flag which should speed up the clone.

Martin Pecka
  • 2,953
  • 1
  • 31
  • 40