3

I'm building a pipeline with Snakemake. One rule involves an R script that reads a CSV file using readr. I get this error when I run the pipeline with --use-singularity and --use-conda

Error: Unknown TZ UTC
In addition: Warning message:
In OlsonNames() : no Olson database found
Execution halted

Google suggests readr is crashing due to missing tzdata but I can't figure out how to install the tzdata package and make readr see it. I am running the entire pipeline in a Mambaforge container to ensure reproducibility. Snakemake recommends using Mambaforge over a Miniconda container as it's faster, but I think my error involves Mambaforge as using Miniconda solves the error.

Here's a workflow to reproduce the error:

#Snakefile
singularity: "docker://condaforge/mambaforge"

rule targets:
    input:
        "out.txt"

rule readr:
    input:
        "input.csv"
    output:
        "out.txt"
    conda:
        "env.yml"
    script:
        "test.R"
#env.yml
name: env
channels:
    - default
    - bioconda
    - conda-forge
dependencies:
    - r-readr
    - tzdata
#test.R
library(readr)
fp <- snakemake@input[[1]]
df <- read_csv(fp)
print(df)
write(df$x, "out.txt")

I run the workflow with snakemake --use-conda --use-singularity. How do I run R scripts when the Snakemake workflow is running from a Mambaforge singularity container?

Tomas Bencomo
  • 349
  • 1
  • 9
  • 1
    Does adding `Sys.setenv("TZDIR"=paste0(Sys.getenv("CONDA_PREFIX"), "/share/zoneinfo"))` to the top of the R script resolve it? If that works, I might drop an Issue on the `tzdata-feedstock` to have them include that environment variable automatically in future builds. Let me know. Seems like if someone installs `tzdata` in their env, it should override system-level versions. – merv Jun 22 '21 at 06:07
  • 1
    That works! I'll create an issue on the `tzdata-feedstock` GitHub. As this solution fixes the problem, should we move this comment to an answer? – Tomas Bencomo Jun 22 '21 at 22:50

1 Answers1

0

Looking through the stack of R code leading to the error, I see that it checks a bunch of default locations for the zoneinfo folder that tzdata includes, but also checks for a TZDIR environment variable.

I believe a proper solution to this would be for the Conda tzdata package to set this variable to point to it. This will require a PR to the Conda Forge package (see repo issue). In the meantime, one could do either of the following as workarounds.

Workaround 1: Set TZDIR from R

Continuing to use the tzdata package from Conda, one could set the environment variable at the start of the R script.

#!/usr/bin/env Rscript

## the following assumes active Conda environment with `tzdata` installed
Sys.setenv("TZDIR"=paste0(Sys.getenv("CONDA_PREFIX"), "/share/zoneinfo"))

I would consider this a temporary workaround.

Workaround 2: Derive a New Docker

Otherwise, make a new Docker image that includes a system-level tzdata installation. This appears to be a common issue, so following other examples (and keeping things clean), it'd go something like:

Dockerfile

FROM --platform=linux/amd64 condaforge/mambaforge:latest

## include tzdata
RUN apt-get update > /dev/null \
  && DEBIAN_FRONTEND="noninteractive" apt-get install --no-install-recommends -y tzdata > /dev/null \
  && apt-get clean

Upload this to Docker Hub and use it instead of the Mambaforge image as the image for Snakemake. This is probably a more reliable long-term solution, but perhaps not everyone wants to create a Docker Hub account.

merv
  • 67,214
  • 13
  • 180
  • 245