I am trying to run a snakemake pipeline on a kubernetes cluster (GKE). The job is being initiated from a GCE VM. Sometimes it works, mostly it doesn't.
Steps I took were
gcloud container clusters get-credentials snakemake-k8s-demo
kubectl delete pod $(kubectl get pods | grep snakejob|colprint 1)
snakemake --kubernetes --container-image eu.gcr.io/scailyte-is/snakemake-gsdk --use-conda --default-remote-provider GS --default-remote-prefix xxxxxx-snakemake-test-1 --jobs 2
This first try worked very well.
I then deleted the files created by the snakemake pipeline and ran the identical job again without changing anything.
The job failed with the following error message:
HTTPSConnectionPool(host='storage.googleapis.com', port=443): Max retries exceeded with url: /storage/v1/b/xxxxxxx-snakemake-test-1/o?projection=noAcl (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f3ee35159d0>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution'))
According to the Google Cloud Status Dashboard, there are no problems with the Google Cloud Storage.
Subsequent attempts failed in the same way.
Any tips for a resolution gratefully accepted.