I have a hand-built Kubernets cluster 1.11.4 using CentOS running as AWS ec2 instances, 1 master and 1 minion. The cluster is very stable. I'm want to deploy JupyterHub into the cluster. The doc here and here call out some details for provisioning EFS. I elected to go with EBS.
The pvc fails with:
Failed to get AWS Cloud Provider. GetCloudProvider returned <nil> instead
Mounted By: hub-76ffd7d94b-dmj8l
Below is the StorageClass definition:
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: gp2
annotations:
storageclass.kubernetes.io/is-default-class: "true"
provisioner: kubernetes.io/aws-ebs
parameters:
type: gp2
fsType: ext4
The pv yaml:
kind: PersistentVolume
apiVersion: v1
metadata:
name: jupyterhub-pv
labels:
type: amazonEBS
spec:
capacity:
storage: 30Gi
accessModes:
- ReadWriteMany
awsElasticBlockStore:
volumeID: vol-0ddb700735db435c7
fsType: ext4
The pvc yaml:
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: jupyterhub-pvc
labels:
type: amazonEBS
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 5Gi
$ kubectl -n jhub describe pvc hub-db-dir
returns:
Name: hub-db-dir
Namespace: jhub
StorageClass: standard <========from an earlier try
Status: Pending
Volume:
Labels: app=jupyterhub
chart=jupyterhub-0.8.2
component=hub
heritage=Tiller
release=jhub
Annotations: volume.beta.kubernetes.io/storage-provisioner: kubernetes.io/aws-ebs
Finalizers: [kubernetes.io/pvc-protection]
Capacity:
Access Modes:
VolumeMode: Filesystem
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning ProvisioningFailed 110s (x106 over 3h43m) persistentvolume-controller Failed to provision volume with StorageClass "standard": Failed to get AWS Cloud Provider. GetCloudProvider returned <nil> instead
Mounted By: hub-76ffd7d94b-dmj8l
To me, this looks like an attempt by the pod to mount the storage, but fails. And isolating this error has been a challenge. I tried patching the pvc to update the storageclass to gp2
, which is now marked as default, but was not at the time I deployed the pvc policy. Patching failed:
$ kubectl -n jhub patch pvc hub-db-dir -p '{"spec":{"StorageClass":"gp2"}}'
persistentvolumeclaim/hub-db-dir patched (no change)
$ kubectl -n jhub describe pvc hub-db-dir
Name: hub-db-dir
Namespace: jhub
StorageClass: standard <====== Not changed
Status: Pending
Volume:
Labels: app=jupyterhub
chart=jupyterhub-0.8.2
component=hub
heritage=Tiller
release=jhub
Annotations: volume.beta.kubernetes.io/storage-provisioner: kubernetes.io/aws-ebs
Finalizers: [kubernetes.io/pvc-protection]
Capacity:
Access Modes:
VolumeMode: Filesystem
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning ProvisioningFailed 2m26s (x108 over 3h48m) persistentvolume-controller Failed to provision volume with StorageClass "standard": Failed to get AWS Cloud Provider. GetCloudProvider returned <nil> instead
Mounted By: hub-76ffd7d94b-dmj8l
JupyterHub deployment is managed by Helm/tiller, so when any changes are made, I use the following to update the pods:
$ helm upgrade jhub jupyterhub/jupyterhub --version=0.8.2 -f config.yaml
The relevant section in the config.yaml file to allocate user storage is:
proxy:
secretToken: "<random value>"
singleuser:
cloudMetadata:
enabled: true
singleuser:
storage:
dynamic:
storageClass: gp2
singleuser:
storage:
extraVolumes:
- name: jupyterhub-pv
persistentVolumeClaim:
claimName: jupyterhub-pvc
extraVolumeMounts:
- name: jupyterhub-pv
mountPath: /home/shared
Part of the troubleshooting has also focused on letting the cluster know that its resources are provisioned by AWS. To that end, I have in the kubernets config file:
/usr/lib/systemd/system/kubelet.service.d/10-kubeadm.conf
the line:
Environment="KUBELET_EXTRA_ARGS=--cloud-provider=aws --cloud-config=/etc/kubernetes/cloud-config.conf
where: /etc/kubernetes/cloud-config.conf
contains:
[Global]
KubernetesClusterTag=kubernetes
KubernetesClusterID=kubernetes
In the files kube-controller-manager.yaml
and kube-apiserver.yaml
I added the line:
- --cloud-provider=aws
I have not yet tagged any AWS resources, but will start doing it based on this.
What are my next next steps for troubleshooting?
Thanks!