4

I use mlflow in a docker environment as described in this example and I start my runs with mlflow run ..

I get output like this

2019/07/17 16:08:16 INFO mlflow.projects: === Building docker image mlflow-myproject-ab8e0e4 ===
2019/07/17 16:08:18 INFO mlflow.projects: === Created directory /var/folders/93/xt2vz36s7jd1fh9bkhkk9sgc0000gn/T/tmp1lxyqqw9 for downloading remote URIs passed to arguments of type 'path' ===
2019/07/17 16:08:18 INFO mlflow.projects: === Running command 'docker run 
--rm -v /Users/foo/bar/mlruns:/mlflow/tmp/mlruns -e 
MLFLOW_RUN_ID=ef21de61d8a6436b97b643e5cee64ae1 -e MLFLOW_TRACKING_URI=file:///mlflow/tmp/mlruns -e MLFLOW_EXPERIMENT_ID=0 mlflow-myproject-ab8e0e4 python train.py' in run with ID 'ef21de61d8a6436b97b643e5cee64ae1' ===

I would like to mount a docker volume named my_docker_volume to the container at the path /data. So instead of the docker run shown above, I would like to use

docker run --rm --mount source=my_docker_volume,target=/data -v /Users/foo/bar/mlruns:/mlflow/tmp/mlruns -e MLFLOW_RUN_ID=ef21de61d8a6436b97b643e5cee64ae1 -e MLFLOW_TRACKING_URI=file:///mlflow/tmp/mlruns -e MLFLOW_EXPERIMENT_ID=0 mlflow-myproject-ab8e0e4 python train.py

I see that I could in principle run it once without mounted volume and then copy the docker run ... and add --mount source=my_volume,target=/data but I'd rather use something like

mlflow run --mount source=my_docker_volume,target=/data .

but this obviously doesn't work because --mount is not a parameter for mlflow run. What's the recommened way of mounting a docker volume then?

simweb
  • 211
  • 2
  • 9

2 Answers2

2

A similar issue has been brought up on the mlflow issue tracker, see "Access large data from within a Docker environment". An excerpt from it says:

However, MLFlow Docker environments currently only have access to data baked into the repository or image or must download a large dataset for each run.

...

A potential solution is to enable the user to mount a volume (e.g. local directory containing the data) into the Docker container.

Looks like this is feature others would benefit from too. Best course of action here would be to contribute support for mounts, or keep track of the issue until someone else implements it.

Why do you need to mount /data folder in the first place? There's another issue, a PR containing a fix related to storing artifacts in a custom location on host machine, could it be something you're looking for?

oldhomemovie
  • 14,621
  • 13
  • 64
  • 99
  • Thanks for the links. Indeed [#1441](https://github.com/mlflow/mlflow/issues/1441) is basically the same as my question. I'm looking forward to seeing the progress there. – simweb Jul 18 '19 at 15:00
  • @simweb it would make sense to post your workaround as a response to your own question for others to be able to find & learn from it – oldhomemovie Jul 18 '19 at 15:47
0

Finally, to avoid the above problem and facilitate volume mounting, I now run my experiments using three interacting docker containers. One that runs the machine learning code, one that runs an mlflow server and one that runs a postgresql server. I closely followed this walk-through article to set things up. It works nicely and docker-compose makes volume mounting easy. Metrics, parameters and meta data are stored in a database that is mounted to a local persistent volume. Artifacts are logged in the directory /mlflow or if you prefer in a docker volume.

Note: There's a typo in the cited walk-through article

In docker-compose.yml it shouldn't be

volumes:
  - ./postgres-store:/var/lib/postgresql/data

which would bind a local folder named postgres-store. Instead, to mount the docker volume postgres_store, you should use

volumes:
  - postgres-store:/var/lib/postgresql/data
simweb
  • 211
  • 2
  • 9
  • 1
    Could you describe how you call `mlflow run` using `docker-compose.yml` in the walk through? – user666 Oct 23 '19 at 03:54
  • 1
    @user666 I don't call `mlflow run` anymore. For my purposes, `mlflow run` was just needed to basically do `docker run` with correct propagation of the environment variables. Now the environment variables are set in the `docker-compose.yml` and instead of calling `docker run` I use `docker-compose up`. – simweb Jan 30 '20 at 08:58