0

I was testing drive hot plugging on my server, and unplugged one of the drives, then plugged it back in. Though, when plugged back in the drive letter changed from /dev/sdj to /dev/sdl. Now ceph will not start the osd daemon, and I am wondering if there is a way to either, redirect the ceph osd to the new drive letter, or, do i need to wipe and re-add the drive?

I can also see the volume for ceph which seems to map to the new path /dev/sdl

$ ceph-volume lvm list
====== osd.36 ======

  [block]       /dev/ceph-862223e6-32e0-412b-8101-8b4af150db9b/osd-block-a8edc8fd-d63f-4ea5-b053-9e9c0b8aeaef

      block device              /dev/ceph-862223e6-32e0-412b-8101-8b4af150db9b/osd-block-a8edc8fd-d63f-4ea5-b053-9e9c0b8aeaef
      block uuid                pwfwY5-CpDm-Dj3X-U1ey-QCw0-2lk1-9CMtxE
      cephx lockbox secret      AQBkIKVken/LDBAAOdBm4/9Nbq8xX1cdlgYqvw==
      cluster fsid              ae1df8df-5f41-45c2-bed2-5b50929b4c7d
      cluster name              ceph
      crush device class
      encrypted                 1
      osd fsid                  a8edc8fd-d63f-4ea5-b053-9e9c0b8aeaef
      osd id                    36
      osdspec affinity
      type                      block
      vdo                       0
      devices                   /dev/sdl

Basically just wondering if there is an easy way to bring this back into the ceph cluster without destroy and re-balancing?

-- logs

2023-07-11T09:30:26.950+0930 7ffa52e053c0 -1 bluestore(/var/lib/ceph/osd/ceph-36/block) _read_bdev_label failed to read from /var/lib/ceph/osd/ceph-36/block: (5) Input/output error
2023-07-11T09:30:26.950+0930 7ffa52e053c0 -1 bluestore(/var/lib/ceph/osd/ceph-36/block) _read_bdev_label failed to read from /var/lib/ceph/osd/ceph-36/block: (5) Input/output error
2023-07-11T09:30:26.950+0930 7ffa52e053c0  1 bdev(0x560b8ff9a000 /var/lib/ceph/osd/ceph-36/block) open path /var/lib/ceph/osd/ceph-36/block
2023-07-11T09:30:26.950+0930 7ffa52e053c0  1 bdev(0x560b8ff9a000 /var/lib/ceph/osd/ceph-36/block) open size 500086865920 (0x746f800000, 466 GiB) block_size 4096 (4 KiB) non-rotational discard not supported
2023-07-11T09:30:26.950+0930 7ffa52e053c0 -1 bluestore(/var/lib/ceph/osd/ceph-36/block) _read_bdev_label failed to read from /var/lib/ceph/osd/ceph-36/block: (5) Input/output error
2023-07-11T09:30:26.950+0930 7ffa52e053c0  1 bdev(0x560b8ff9a000 /var/lib/ceph/osd/ceph-36/block) close
2023-07-11T09:30:27.258+0930 7ffa52e053c0 -1 osd.36 0 OSD:init: unable to mount object store
2023-07-11T09:30:27.258+0930 7ffa52e053c0 -1  ** ERROR: osd init failed: (5) Input/output error

cheers

donkeyx
  • 101
  • 1
  • Which ceph version is this? – eblock Jul 13 '23 at 06:30
  • ceph version 17.2.6 (995dec2cdae920da21db2d455e55efbc339bde24) quincy (stable) – donkeyx Jul 21 '23 at 06:20
  • Usually, ceph uses the LV tags to start OSDs so it doesn't require the drive letters, that's why I asked for the ceph version. Can you share the output of `ceph osd metadata 36` and add it to the question? Have you tried to reboot the node? Maybe that would already fix it, not sure though. – eblock Jul 21 '23 at 06:52
  • Hey @eblock, its too late for me to get the osd metadata from that osd now, i have just deleted and re-balanced. But, if it was still there what would be the steps to re-map after getting that meta? - I did attempt restarting but it never remapped as it was looking for that incorrect drive path each time – donkeyx Jul 31 '23 at 00:48
  • I don't have the capabilities to test that myself, so my advice is limited. Usually I would only hot-swap a broken drive to replace it which ceph handles well. Is there a reason for hot-swapping an intact drive? Did you stop the OSD process before pulling the drive? Maybe try the [ceph-users mailing list](https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/), somebody there might have tried what you tried. – eblock Jul 31 '23 at 07:14
  • Just wondering, do you use device paths with the /dev/sdX notation in your drivegroup config? – eblock Aug 01 '23 at 18:03

0 Answers0