2

Environment:

Kubernetes cluster with 1 master and 3 nodes Ubuntu 18.04.3 LTS (GNU/Linux 4.15.0-66-generic x86_64) (VMWARE VMs)

screenshot from dashboard

Pod (simple nginx image) cannot be mounted to a specified Volume in Kubernetes cluster with rook-ceph and csi-cephfs storage class. It shows an error:

MountVolume.MountDevice failed for volume "pvc-9aad698e-ef82-495b-a1c5-e09d07d0e072" : rpc error: code = Aborted desc = an operation with the given Volume ID 0001-0009-rook-ceph-0000000000000001-89d24230-0571-11ea-a584-ce38896d0bb2 already exists

PVC and PV are green. PVC is ReadWriteMany but it also fails with ReadWriteOnce

Ceph cluster is HEALTH_OK and all is green.

What am i missing?


Some more logs:

Normal   Scheduled               <unknown>            default-scheduler        Successfully assigned rook-ceph/csicephfs-demo-pod to <myhost>

  Normal   SuccessfulAttachVolume  2m37s                attachdetach-controller  AttachVolume.Attach succeeded for volume "pvc-c1ad8144-15ae-49f6-a012-d866b74ff902"

  Warning  FailedMount             2m17s                kubelet, <myhost>        Unable to attach or mount volumes: unmounted volumes=[mypvc], unattached volumes=[mypvc default-token-wfjxl]: timed out waiting for the condition

  Warning  FailedMount             2m4s                 kubelet, <myhost>        MountVolume.MountDevice failed for volume "pvc-c1ad8144-15ae-49f6-a012-d866b74ff902" : rpc error: code = DeadlineExceeded desc = context deadline exceeded

  Warning  FailedMount             108s (x5 over 2m4s)  kubelet, <myhost>        MountVolume.MountDevice failed for volume "pvc-c1ad8144-15ae-49f6-a012-d866b74ff902" : rpc error: code = Aborted desc = an operation with the given Volume ID 0001-0009-rook-ceph-0000000000000001-0bc5ddfc-05f2-11ea-9f0a-bee51ab2829b already exists

kubectl -n rook-ceph get pv,pvc -o wide
NAME                                                        CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                        STORAGECLASS   REASON   AGE     VOLUMEMODE
persistentvolume/pvc-c1ad8144-15ae-49f6-a012-d866b74ff902   1Gi        RWX            Delete           Bound    rook-ceph/cephfs-pvc-many2   csi-cephfs              114m    Filesystem
persistentvolume/pvc-d678dd06-7197-4342-934d-33e60edc564a   1Gi        RWO            Delete           Bound    rook-ceph/cephfs-pvc         csi-cephfs              6d19h   Filesystem

NAME                                     STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE    VOLUMEMODE
persistentvolumeclaim/cephfs-pvc         Bound    pvc-d678dd06-7197-4342-934d-33e60edc564a   1Gi        RWO            csi-cephfs     11d    Filesystem
persistentvolumeclaim/cephfs-pvc-many2   Bound    pvc-c1ad8144-15ae-49f6-a012-d866b74ff902   1Gi        RWX            csi-cephfs     118m   Filesystem

Original PVC YAML:

---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: cephfs-pvc-many2
  namespace: rook-ceph
spec:
  accessModes:
  - ReadWriteMany
  volumeMode: Filesystem
  resources:
    requests:
      storage: 1Gi
  storageClassName: csi-cephfs

POD:

---
apiVersion: v1
kind: Pod
metadata:
  name: csicephfs-demo-pod
  namespace: rook-ceph
spec:
  containers:
   - name: web-server
     image: nginx
     volumeMounts:
       - name: mypvc
         mountPath: /var/lib/www/html
  volumes:
   - name: mypvc
     persistentVolumeClaim:
       claimName: cephfs-pvc-many2
       readOnly: false
Alec
  • 23
  • 1
  • 1
  • 5
  • please provide output of `kubectl get pv,pvc -o wide`, your `PV` and `PVC` YAMLs. Also please provide some informations about your env (Cloud/Baremetal) Windows or Linux OS. – PjoterS Nov 13 '19 at 10:29
  • yes, i updated my question with this information – Alec Nov 13 '19 at 10:52
  • looking at the cephfs CSI device plugins and controller pods, don't you see anything unusual? Try grepping for the volume name or ID. Also, ceph CSI provisioner tends to get stuck when nodes are missing/notready, though I would assume you're not missing any node right now? – SYN Nov 13 '19 at 11:16
  • Nodes are good $ kubectl get nodes NAME STATUS ROLES AGE VERSION *** Ready 20d v1.16.2 *** Ready 27d v1.16.2 *** Ready 21d v1.16.2 *** Ready master 27d v1.16.2 I will try to grep some info from other csi and ceph pods... – Alec Nov 13 '19 at 13:03
  • No.. :( really no errors so far...2 pods with csi-cephfsplugins and 2 pods with csi-cephfsplugin-provisioner are running w/o errors... – Alec Nov 13 '19 at 13:18
  • can you provide your storage class yaml? – Matt Nov 19 '19 at 09:51
  • Did you solve it? – kool Feb 22 '21 at 14:21

2 Answers2

2

I had this error, and what solved it for me was deleting the csi-cephfsplugin-provisioner and csi-rbdplugin-provisioner pods, and let the replicaset recreate them. Once I did that, all of my PVCs created PVs and bound as expected. I may have only needed to kill the csi-rbdplugin-provisioner pods, so try that first.

Fred Drake
  • 121
  • 3
0

After node restart, rook ceph external cluster creating the pvc and pv via public ip successfully. However while attaching with the node failing because it is trying via the inaccessible cluster IP. How to force rook ceph to use public IP?

I understood this by logging into the node(by ssh) and checked the output of sudo dmesg....But no idea on how to set this to public IP instead of private IP because, from the kubernetes cluster nodes external osd's cluster ip's are not accessible! Any suggestions will be appreciated! Thanks!

deb
  • 1
  • If you have a new question, please ask it by clicking the [Ask Question](https://serverfault.com/questions/ask) button. Include a link to this question if it helps provide context. - [From Review](/review/late-answers/559034) – ceskib Jul 26 '23 at 15:46