I want to containerise a pipeline of code that was predominantly developed in Python but has a dependency on a model that was trained in R. There are some additional dependencies on the requirements and packages needed for both codebases. How can I create a Docker image that allows me to build a container that will run this Python and R code together?
For context, I have an R code that runs a model (random forest) but it needs to be part of a data pipeline that was built in Python. The Python pipeline performs some functionality first and generates input for the model, then executes the R code with that input, before taking the output to the next stage of the Python pipeline.
So I've created a template for this process by writing a simple test Python function to call an R code ("test_call_r.py" which imports the subprocess package) and need to put this in a Docker container with the necessary requirements and packages for both Python and R.
I have been able to build the Docker container for the Python pipeline itself, but cannot successfully install R and the associated packages alongside the Python requirements. I want to rewrite the Dockerfile to create an image to do this.
From the Dockerhub documentation I can create an image for the Python pipeline using, e.g.,
FROM python:3
WORKDIR /app
COPY requirements.txt /app/
RUN pip install --no-cache-dir -r requirements.txt
COPY . /app
CMD [ "python", "./test_call_r.py" ]
And similarly from Dockerhub I can use a base R image (or Rocker) to create a Docker container that can run a randomForest model, e.g.,
FROM r-base
WORKDIR /app
COPY myscripts /app/
RUN Rscript -e "install.packages('randomForest')"
CMD ["Rscript", "myscript.R"]
But what I need is to create an image that can install the requirements and packages for both Python and R, and execute the codebase to run R from a subprocess in Python. How can I do this?