I have recently started using Docker to secure the computational reproducibility of my research. Since the HPC service at my institution only supports singularity, I want to import a Docker image within singularity when I perform part of my analysis using the HPC. When I did this, however, I found that the results based on the original Docker image differ from those based on the Docker image imported in singularity.
Here is what I did to build a simple Bayesian regression model based directly on a Docker image. This was run locally and also on an instance at AWS, resulting in identical output (as expected).
docker pull akiramurakami/gramm-mor:v1.0
docker run -it akiramurakami/gramm-mor:v1.0 bash
Rscript -e 'library("brms"); library("tidyverse"); set.seed(1); d <- tibble(x = rnorm(100), y = 2 * x - 1 + rnorm(100)); m <- brm(y ~ x, data = d, seed = 1); summary(m)'
Below is part of the output.
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept -1.04 0.10 -1.23 -0.85 1.00 3812 2469
x 2.00 0.11 1.79 2.21 1.00 4625 3037
Here’s what I did on HPC, using singularity.
singularity pull docker://akiramurakami/gramm-mor:v1.0
singularity exec gramm-mor_v1.0.sif Rscript -e 'library("brms"); library("tidyverse"); set.seed(1); d <- tibble(x = rnorm(100), y = 2 * x - 1 + rnorm(100)); m <- brm(y ~ x, data = d, seed = 1); summary(m)'
And the results are different (see Bulk_ESS
and Tail_ESS
columns).
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept -1.04 0.10 -1.23 -0.84 1.00 3798 2826
x 2.00 0.11 1.78 2.22 1.00 4275 2913
Why is this and is there a way to import and use a Docker image in singularity so that it yields the same results as those based on the original Docker image?
Below is the Dockerfile used.
FROM rocker/r-ver:3.6.3
LABEL "maintainer"="xxx"
RUN apt-get update -qq && apt-get -y --no-install-recommends install \
file \
git \
libapparmor1 \
libclang-dev \
libcurl4-openssl-dev \
libedit2 \
libssl-dev \
lsb-release \
multiarch-support \
psmisc \
procps \
python-setuptools \
sudo \
wget \
libxml2-dev \
libcairo2-dev \
libsqlite-dev \
libmariadbd-dev \
libmariadbclient-dev \
libpq-dev \
libssh2-1-dev \
unixodbc-dev \
libsasl2-dev \
clang
# https://github.com/stan-dev/rstan/wiki/Installing-RStan-on-Linux
RUN Rscript -e 'dotR <- file.path(Sys.getenv("HOME"), ".R"); \
if (!file.exists(dotR)) dir.create(dotR); \
M <- file.path(dotR, "Makevars"); \
if (!file.exists(M)) file.create(M); \
cat("\nCXX14FLAGS=-O3 -march=native -mtune=native -fPIC","CXX14=clang++",file = M, sep = "\n", append = TRUE)'
RUN Rscript -e 'options(repos = list(CRAN = "http://mran.revolutionanalytics.com/snapshot/2020-07-01")); \
install.packages(c("brms", "data.table", "devtools", "SnowballC", "tidyverse", "dplyr"))'
Update on the 29th of August, 2020:
I have asked the same question at the Stan Forums and received some useful comments (although the exact reason for the concerned difference still remains unclear).