1

I am using GCSFuse for mounting the GCS bucket to my user pod in JupyterHub, but it always fail with the error message gcsfuse takes exactly two arguments.

Here is my DockerFile:

FROM jupyter/minimal-notebook:177037d09156

ENV GCSFUSE_REPO gcsfuse-stretch
ENV GOOGLE_APPLICATIONS_CREDENTIALS=test-serviceaccount.json
ENV GCS_BUCKET: "my-bucket"
ENV GCS_BUCKET_FOLDER: "shared-data"

USER root

# Add google repositories for gcsfuse and google cloud sdk
RUN apt-get update -y && apt-get install -y --no-install-recommends apt-transport-https ca-certificates curl gnupg
RUN echo "deb http://packages.cloud.google.com/apt $GCSFUSE_REPO main" | tee /etc/apt/sources.list.d/gcsfuse.list
RUN echo "deb https://packages.cloud.google.com/apt cloud-sdk main" | tee -a /etc/apt/sources.list.d/google-cloud-sdk.list
RUN curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add -

# Install gcsfuse and google cloud sdk
RUN apt-get update -y  && apt-get install -y gcsfuse google-cloud-sdk \
    && apt-get autoremove -y \
    && apt-get clean -y \
    && rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*

# Switch back to notebook user (defined in the base image)
USER $NB_UID

# make directory for mounting
RUN mkdir -p home/shared-data \
    && mkdir -p etc/scripts

COPY start_mounting.sh etc/scripts

# install extra packages required for model training
RUN pip install --upgrade pip
RUN pip install fasttext
RUN pip install ax-platform

CMD ["bin/bash", "etc/scripts/start_mounting.sh"]

Script:

#!/bin/bash

# Setup GCSFuse
 gcsfuse --key-file ${GOOGLE_APPLICATIONS_CREDENTIALS} ${GCS_BUCKET} ${GCS_BUCKET_FOLDER}

my jupyterhub config.yaml

hub:
  baseUrl: /jupyterhub
  extraConfig: |
    from kubernetes import client
    def modify_pod_hook(spawner, pod):
        pod.spec.containers[0].security_context = client.V1SecurityContext(
            privileged=True,
            capabilities=client.V1Capabilities(
                add=['SYS_ADMIN']
            )
          )
        pod.spec.containers[0].env.append(
              client.V1EnvVar(
                  name='GOOGLE_APPLICATIONS_CREDENTIALS',
                  value_from=client.V1EnvVarSource(
                      secret_key_ref=client.V1SecretKeySelector(
                          name='jhub-secret',
                          key='jhub-serviceaccount',
                      )
                  )
              )
          )
        return pod
    c.KubeSpawner.modify_pod_hook = modify_pod_hook

singleuser:
  storage:
    type: none
  extraEnv:
  GCS_BUCKET: "my-bucket"
  GCS_BUCKET_FOLDER: "shared-data"
  lifecycleHooks:
    postStart:
      exec:
        command: ["/bin/sh", "etc/scripts/start_mounting.sh"]
    preStop:
      exec:
        command: ["fusermount", "-u", "shared-data"]
  image:
    name: gcr.io/project/base-images/jhub-k8s-cust-singleuser
    tag: 1.1.6
    pullPolicy: Always

I am overwriting the GOOGLE_APPLICATIONS_CREDENTIALS ENV for using it in --key-file argument in gcsfuse.

Could someone please tell me what is wrong here? Is something wrong with my pod PostStart Exec command? or my gcsfuse is wrong?

tank
  • 465
  • 8
  • 22
  • Can you add in your script, this at the line 2 `echo ${GOOGLE_APPLICATIONS_CREDENTIALS}`. I guess it's not a file name, but a JSON file content. Right? – guillaume blaquiere Aug 08 '20 at 19:50
  • Oh yes I realized it is json content and not json file. Thanks for pointing that out. How can I write this to file before executing script ? – tank Aug 08 '20 at 20:22

2 Answers2

1

I'm not an expert (and even a user) of JupyterHub. My answer is generic

I'm seeing 2 way to solve your issue

  • You can mount your secret file (if you have your json key in a file) into the container at runtime. However I don't know the jupyterhub syntax for achieving this
  • You can try this

In your jupyterhub yaml file, change the env var of your json key file content

          pod.spec.containers[0].env.append(
              client.V1EnvVar(
                  name='GOOGLE_APPLICATIONS_CREDENTIALS_CONTENT',
                  value_from=client.V1EnvVarSource(
                      secret_key_ref=client.V1SecretKeySelector(
                          name='jhub-secret',
                          key='jhub-serviceaccount',
                      )
                  )
              )
          )

Change your script like this (write the content into the defined file):

#!/bin/bash

echo ${GOOGLE_APPLICATIONS_CREDENTIALS_CONTENT} > ${GOOGLE_APPLICATIONS_CREDENTIALS}

# Setup GCSFuse
 gcsfuse --key-file ${GOOGLE_APPLICATIONS_CREDENTIALS} ${GCS_BUCKET} ${GCS_BUCKET_FOLDER}

The container is immutable. I think that will work because the change is performed only in memory.

Note: prefer an absolute path for the GOOGLE_APPLICATIONS_CREDENTIALS file path defintion

guillaume blaquiere
  • 66,369
  • 2
  • 47
  • 76
1

I solved it by creating the volume mounts for K8s secret (Google Service Account) and passing it as ENV in the script start_mounting.sh for the gcsfuse command.

Below is the code that i used:

  storage:
      extraVolumes:
        - name: my-secret-jupyterhub
          secret:
            secretName: my-secret
      extraVolumeMounts:
        - name: my-secret-jupyterhub
          mountPath: /etc/secrets
          readOnly: true
    extraEnv:
      GOOGLE_APPLICATIONS_CREDENTIALS: /etc/secrets/key.json

This seems to be rather more cleaner approach than getting the file contents of service account and again put it in file for the gcsfuse command as i was doing previously and discussed above.

tank
  • 465
  • 8
  • 22