0

I am trying to run a script through AWS Batch, following the tutorial from here. In particular, the entry point script is the same: it is a script that downloads the code to be executed in AWS Batch from an S3 bucket. However, no matter how I try to execute it on AWS, I always receive:

CannotStartContainerError: API error (400): OCI runtime create failed: 
  container_linux.go:348: starting container process caused "exec:
  \"/usr/local/bin/fetch_and_run.sh\": 
  stat /usr/local/bin/fetch_and_run.sh: no such file or directory": unknown

I am able to start the same container locally.

I start the process from awscli with the following command:

aws batch submit-job --job-name mss_dev --job-definition mapper \
  --job-queue bio-job-queue \
  --container-overrides '{"environment": \
  [{"name": "BATCH_FILE_S3_URL", "value": "s3://test/myjob.sh"}, \
   {"name": "BATCH_FILE_TYPE", "value": "script"}], \
   "command":["/usr/local/bin/fetch_and_run.sh"]}'

My Dockerfile is the following:

FROM amazonlinux:latest

# General dependencies and user
## aws-cli installed twice (here for root, later for user)
RUN yum -y install which unzip tar wget aws-cli curl sudo
RUN yum -y groupinstall 'Development Tools'
RUN yum -y install gcc git curl make zlib-devel bzip2 bzip2-devel readline-devel sqlite sqlite-devel openssl openssl-devel
RUN yum -y install java-1.8.0-openjdk.x86_64
## User and work directory
RUN groupadd -r user && useradd -mr -g user -d /home/user -s /sbin/nologin -c "Docker image user" user
RUN echo "user ALL=(ALL) NOPASSWD: ALL" >> /etc/sudoers
ENV HOME /home/user
## Change user to user
USER user
ENV USER user
RUN sh -c "$(curl -fsSL https://raw.githubusercontent.com/Linuxbrew/install/master/install.sh)" && echo 'export PATH="/home/linuxbrew/.linuxbrew/bin:$PATH"' >>~/.profile
## GNU parallel 10 seconds installation
#WORKDIR $HOME/tools/parallel
#RUN (wget -O - pi.dk/3 || curl pi.dk/3/ || fetch -o - http://pi.dk/3) | bash
# RUN brew install gcc
ENV PATH "/home/linuxbrew/.linuxbrew/bin:$PATH"
RUN brew install parallel

# Pyenv
WORKDIR $HOME
RUN git clone git://github.com/yyuu/pyenv.git .pyenv

ENV PYENV_ROOT $HOME/.pyenv
ENV PATH $PYENV_ROOT/shims:$PYENV_ROOT/bin:$PATH

# Python3
RUN pyenv install 3.6.5
RUN pyenv global 3.6.5
RUN pyenv rehash

# Python3 modules
RUN pip install --upgrade pip
RUN pip install --upgrade awscli pandas scipy numpy kneed

# STAR
RUN mkdir -p $HOME/tools/STAR
WORKDIR $HOME/tools/STAR
RUN wget https://github.com/alexdobin/STAR/archive/2.6.1b.tar.gz && tar xvf 2.6.1b.tar.gz

# DropSeq
RUN mkdir -p $HOME/tools/DropSeq
WORKDIR $HOME/tools/DropSeq
RUN wget https://github.com/broadinstitute/Drop-seq/releases/download/v1.13/Drop-seq_tools-1.13.zip && unzip Drop-seq_tools-1.13.zip

# Reference and other files should be downloaded during execution
RUN mkdir -p $HOME/data
RUN mkdir -p $HOME/results
COPY --chown=user:user code /home/user/code

# Copy main files and set entrypoint
WORKDIR /tmp
ADD fetch_and_run.sh /usr/local/bin/fetch_and_run.sh
USER nobody
ENTRYPOINT ["/usr/local/bin/fetch_and_run.sh"]
# To debug
# ENTRYPOINT ["/bin/bash"]
gc5
  • 9,468
  • 24
  • 90
  • 151

1 Answers1

0

The culprit was in the Job Definition (from the AWS console, see "Create a job definition" from here). For ECR Repository URI I forgot to use the URI of my updated image (e.g. 012345678901.dkr.ecr.us-east-1.amazonaws.com/awsbatch/fetch_and_run), I was using instead the default amazonlinux image.

The main hint was that I was able to run it locally.

gc5
  • 9,468
  • 24
  • 90
  • 151