As answered in this stackoverflow post, it is not possible to customize datalab with extra python modules through a supported method. My suggestion would be to install the python script/cron job in another system outside of datalab, as you would with any python script that you want to run unrelated to datalab.
Really Long Side Note:
If you have to run the program within the datalab container because you want to make use of the datalab specific gcp libraries, then I propose the following unsupported (yet creative) setup that has worked for me. However, it involves running a local datalab container, as well as a cloud datalab container.
- Install datalab locally
- Append the following to the file
Dockerfile.in
file at
$REPO_DIR/containers/datalab/Dockerfile.in
# Add a custom script which calls a custom program (python file)
ADD mycustomprogram.sh /usr/local/bin/mycustomprogram.sh
# Allow the script to be executed
RUN chmod +x /usr/local/bin/mycustomprogram.sh
- Modify the
ENTRYPOINT
variable in $REPO_DIR/containers/datalab/run.sh
to point to your custom script
Now you have a custom script running inside the datalab local container.
With the local setup, you can still commit to the same Google hosted git repository using any git client from your host machine. gcloud has a simple prompt that will guide you through the process of cloning the Google hosted git repository.
Simply run gcloud init
.
After signing in, you should see the following prompt which asks you whether you want to use a Google hosted repository:
Do you want to use Google's source hosting (Y/n)?
IMPORTANT NOTE: This is only a temporary work around while we wait for additional datalab customization options. I would much prefer to edit the cloud Dockerfile.in file , rather than deploy a local datalab instance, in order to install a custom python program.