I am new to docker, and have downloaded the tfx image using
docker pull tensorflow/tfx
However, I am unable to find anywhere how to successfully launch a container for the same. here's a naive attempt
I am new to docker, and have downloaded the tfx image using
docker pull tensorflow/tfx
However, I am unable to find anywhere how to successfully launch a container for the same. here's a naive attempt
You can use docker image ls
to get a list of locally-built Docker images. Note that an "image" is a template for a VM.
To instantiate the VM and shell into it, you would use a command like docker run -t --entrypoint bash tensorflow/tfx
. This spins up a temporary VM based on the tensorflow/tfx
image.
By default, Docker assumes you want the latest version of that image stored on your local machine, i.e. tensorflow/tfx:latest
in the list. If you want to change it, you can reference a specific image version by name or hash, e.g. docker run -t --entrypoint bash tensorflow/tfx:1.0.0
or docker run -t --entrypoint bash fe507176d0e6
. I typically use the docker image ls
command first and cut & paste the hash, so my notes can be specific about which build I'm referencing even if I later edit the relevant Dockerfile.
Also note that changes you make inside that VM will not be saved. Once you exit the bash shell, it goes away. The shell is useful for checking the state & file structure of a constructed image. If you want to edit the image itself, use a Dockerfile. Each line of a Dockerfile creates a new image when the Dockerfile is compiled. If you know that something went wrong between lines 5 and 10 of the Dockerfile, you can potentially shell into each of those images in turn (with the docker run
command I gave above) to see what went wrong. Kinda tedious, but it works.
Also note that docker run
is not equivalent to running a TFX pipeline. For the latter, you want to look into the TFX CLI commands or otherwise compile the pipeline - and probably upload it to an external Kubeflow server.
Also note that the Docker image is just a starting point for one piece of your TFX pipeline. A full pipeline will require you to specify the components you want, a more-complete Dockerfile, and more. That's a huge topic, and IMO, the existing documentation leaves a lot to be desired. The Dockerfile you create describes the image which will be distributed to each of the workers which process the full pipeline. It's the place to specify dependencies, necessary files, and other custom setup for the machine. Most ML-relevant concerns are handled in other files.