1

I'm running a flask application that serves an ML model that loads in a wordembeddings file (2GB) on k8s. This file is being loaded with gcsfuse and we have the application running for about 2 years.

Since a recent restart of the pod, this setup is not working anymore while nothing has changed in our code/deployment settings. While debugging, I notice that we even have the problem with the following dockerfile that does not even use the python script:

Dockerfile:

FROM levkuznetsov/gcsfuse-docker

RUN apt-get update && apt-get install -y

build-essential

COPY . /app

WORKDIR /app

RUN /bin/bash -c "mkdir -p /app/wordembeddingtest"

COPY ./serviceacc.json /

ADD /serviceacc.json /etc/gcloud/serviceacc.json

ADD /serviceacc.json /etc/gcloud/service-account.json

EXPOSE 8080

ENTRYPOINT ["/bin/bash", "-c", "gcsfuse bucket_name wordembeddingtest ; ls wordembeddingtest"]

What is even more strange, is that we have other deployments that use the same set-up, and they can be restarted and still work..

The logs show the following error:

enter image description here

And with --foreground --debug_invariants --debug_http --debug_gcs --debug_fuse we get the following:

enter image description here

enter image description here

What I have checked so far:

Service account permissions are ok

What I have tried so far:

Different storage bucket gcsfuse command with Implicit-dirs, -o allow_other Different kubernetes cluster Other mount folder locations

tgogos
  • 23,218
  • 20
  • 96
  • 128
Brecht Coghe
  • 286
  • 1
  • 7

0 Answers0