-1

My (otherwise healthy) ceph cluster is in a HEALTH_WARN state. It was bootstrapped with cephadm.

The warning is "1 failed cephadm daemon(s)" ceph log last cephadm shows these two messages every 10 minutes or so, but otherwise, nothing:

cephadm [INF] Detected new or changed devices on cephadm [INF] Adjusting osd_memory_target on to 6532M

ceph health detail returns: [WRN] CEPHADM_FAILED_DAEMON: 1 failed cephadm daemon(s) daemon osd.23 on is in error state

ceph osd tree output:

    ID  CLASS  WEIGHT     TYPE NAME          STATUS  REWEIGHT  PRI-AFF
-1         143.71681  root default
-3         100.05257      host dberams5
 0    hdd    9.09569          osd.0          up   1.00000  1.00000
 1    hdd    9.09569          osd.1          up   1.00000  1.00000
 2    hdd    9.09569          osd.2          up   1.00000  1.00000
 3    hdd    9.09569          osd.3          up   1.00000  1.00000
 4    hdd    9.09569          osd.4          up   1.00000  1.00000
 5    hdd    9.09569          osd.5          up   1.00000  1.00000
 6    hdd    9.09569          osd.6          up   1.00000  1.00000
 7    hdd    9.09569          osd.7          up   1.00000  1.00000
 8    hdd    9.09569          osd.8          up   1.00000  1.00000
 9    hdd    9.09569          osd.9          up   1.00000  1.00000
10    hdd    9.09569          osd.10         up   1.00000  1.00000
-5          43.66425      host dberams6
11    hdd    3.63869          osd.11         up   1.00000  1.00000
12    hdd    3.63869          osd.12         up   1.00000  1.00000
13    hdd    3.63869          osd.13         up   1.00000  1.00000
14    hdd    3.63869          osd.14         up   1.00000  1.00000
15    hdd    3.63869          osd.15         up         0  1.00000
16    hdd    3.63869          osd.16         up   1.00000  1.00000
17    hdd    3.63869          osd.17         up   1.00000  1.00000
18    hdd    3.63869          osd.18         up   1.00000  1.00000
19    hdd    3.63869          osd.19         up   1.00000  1.00000
20    hdd    3.63869          osd.20         up   1.00000  1.00000
21    hdd    3.63869          osd.21         up   1.00000  1.00000
22    hdd    3.63869          osd.22         up   1.00000  1.00000

osd.23 was removed previously, but apparently not removed completely? It isn't listed in cephadm ls.

Output from cephadm ceph-volume lvm list:

====== osd.11 ======

  [block]       /dev/ceph-2fb72d39-fa86-4ae8-b4c8-6208fa163d50/osd-block-f891ae90-adac-4d74-9684-6e0efa4d8742

      block device              /dev/ceph-2fb72d39-fa86-4ae8-b4c8-6208fa163d50/osd-block-f891ae90-adac-4d74-9684-6e0efa4d8742
      block uuid                nuQmVe-eh35-uLTK-UxE5-k9g0-0re9-7mk7fT
      cephx lockbox secret
      cluster fsid              74ed6c1c-a3eb-11ed-9f3e-bfd1f90fb41c
      cluster name              ceph
      crush device class
      encrypted                 0
      osd fsid                  f891ae90-adac-4d74-9684-6e0efa4d8742
      osd id                    11
      osdspec affinity          cost_capacity
      type                      block
      vdo                       0
      devices                   /dev/sdb

====== osd.12 ======

  [block]       /dev/ceph-8fac741c-6bb7-43dd-9598-6953ea4686bb/osd-block-378caf5e-db6a-4a89-862b-100dfcbc5917

      block device              /dev/ceph-8fac741c-6bb7-43dd-9598-6953ea4686bb/osd-block-378caf5e-db6a-4a89-862b-100dfcbc5917
      block uuid                qW5y6o-uTxY-CFh6-vrS5-99Zy-oQvN-Bm474v
      cephx lockbox secret
      cluster fsid              74ed6c1c-a3eb-11ed-9f3e-bfd1f90fb41c
      cluster name              ceph
      crush device class
      encrypted                 0
      osd fsid                  378caf5e-db6a-4a89-862b-100dfcbc5917
      osd id                    12
      osdspec affinity          cost_capacity
      type                      block
      vdo                       0
      devices                   /dev/sdd

====== osd.13 ======

  [block]       /dev/ceph-2d01e1c7-1140-45e4-8791-70ef4fdc39a4/osd-block-39deeac8-60f5-49c4-a4e1-37fa195a3c78

      block device              /dev/ceph-2d01e1c7-1140-45e4-8791-70ef4fdc39a4/osd-block-39deeac8-60f5-49c4-a4e1-37fa195a3c78
      block uuid                zT7Nuk-JAx4-HccW-Jp6V-vEHo-pkMV-h4OWTo
      cephx lockbox secret
      cluster fsid              74ed6c1c-a3eb-11ed-9f3e-bfd1f90fb41c
      cluster name              ceph
      crush device class
      encrypted                 0
      osd fsid                  39deeac8-60f5-49c4-a4e1-37fa195a3c78
      osd id                    13
      osdspec affinity          cost_capacity
      type                      block
      vdo                       0
      devices                   /dev/sde

====== osd.14 ======

  [block]       /dev/ceph-4472b664-880a-477e-b1d7-eb09e1b367f2/osd-block-adc99ded-45c3-47b6-95fd-67e5addfb777

      block device              /dev/ceph-4472b664-880a-477e-b1d7-eb09e1b367f2/osd-block-adc99ded-45c3-47b6-95fd-67e5addfb777
      block uuid                cHei6V-mYb1-E29M-tX6e-7ObJ-rrfs-2gnmvb
      cephx lockbox secret
      cluster fsid              74ed6c1c-a3eb-11ed-9f3e-bfd1f90fb41c
      cluster name              ceph
      crush device class
      encrypted                 0
      osd fsid                  adc99ded-45c3-47b6-95fd-67e5addfb777
      osd id                    14
      osdspec affinity          cost_capacity
      type                      block
      vdo                       0
      devices                   /dev/sdf

====== osd.15 ======

  [block]       /dev/ceph-a927dc91-e082-4137-a00b-0e0ee4e8d8a9/osd-block-3a131776-13b4-4837-bb5a-d95f598a4ad4

      block device              /dev/ceph-a927dc91-e082-4137-a00b-0e0ee4e8d8a9/osd-block-3a131776-13b4-4837-bb5a-d95f598a4ad4
      block uuid                DbaumG-R1z2-v0qN-Zfzn-Q9HA-RmQ2-yJkV2y
      cephx lockbox secret
      cluster fsid              74ed6c1c-a3eb-11ed-9f3e-bfd1f90fb41c
      cluster name              ceph
      crush device class
      encrypted                 0
      osd fsid                  3a131776-13b4-4837-bb5a-d95f598a4ad4
      osd id                    15
      osdspec affinity          cost_capacity
      type                      block
      vdo                       0
      devices                   /dev/sdg

====== osd.16 ======

  [block]       /dev/ceph-49bffd00-74d5-4614-877b-29401f5a2c67/osd-block-6e086946-0e3f-40de-a7ad-0c874f28d00e

      block device              /dev/ceph-49bffd00-74d5-4614-877b-29401f5a2c67/osd-block-6e086946-0e3f-40de-a7ad-0c874f28d00e
      block uuid                2Iu8rt-h6vt-bzH3-9r3W-ALq6-3DSC-l8JGXu
      cephx lockbox secret
      cluster fsid              74ed6c1c-a3eb-11ed-9f3e-bfd1f90fb41c
      cluster name              ceph
      crush device class
      encrypted                 0
      osd fsid                  6e086946-0e3f-40de-a7ad-0c874f28d00e
      osd id                    16
      osdspec affinity          cost_capacity
      type                      block
      vdo                       0
      devices                   /dev/sdh

====== osd.17 ======

  [block]       /dev/ceph-ecb0bcb6-abbb-4de8-b6ce-790122135126/osd-block-00bdb5d7-a0c0-401a-b9d9-922804c1859d

      block device              /dev/ceph-ecb0bcb6-abbb-4de8-b6ce-790122135126/osd-block-00bdb5d7-a0c0-401a-b9d9-922804c1859d
      block uuid                dJK4vz-vFAm-chiu-m31h-qSru-02lH-GuG8NJ
      cephx lockbox secret
      cluster fsid              74ed6c1c-a3eb-11ed-9f3e-bfd1f90fb41c
      cluster name              ceph
      crush device class
      encrypted                 0
      osd fsid                  00bdb5d7-a0c0-401a-b9d9-922804c1859d
      osd id                    17
      osdspec affinity          cost_capacity
      type                      block
      vdo                       0
      devices                   /dev/sdj

====== osd.18 ======

  [block]       /dev/ceph-43928c3a-75d4-4d43-b4ad-23df5d86920c/osd-block-9db61f69-3f29-4aa8-af60-b8054cbb7021

      block device              /dev/ceph-43928c3a-75d4-4d43-b4ad-23df5d86920c/osd-block-9db61f69-3f29-4aa8-af60-b8054cbb7021
      block uuid                ZCStpy-dhAz-mm7A-mL5Z-DzKQ-06KE-vOWGvw
      cephx lockbox secret
      cluster fsid              74ed6c1c-a3eb-11ed-9f3e-bfd1f90fb41c
      cluster name              ceph
      crush device class
      encrypted                 0
      osd fsid                  9db61f69-3f29-4aa8-af60-b8054cbb7021
      osd id                    18
      osdspec affinity          cost_capacity
      type                      block
      vdo                       0
      devices                   /dev/sdk

====== osd.19 ======

  [block]       /dev/ceph-8ef71992-0a59-477e-abed-3f10d410846a/osd-block-0c6da4d7-3d99-4fc1-864f-64c9a31fcd13

      block device              /dev/ceph-8ef71992-0a59-477e-abed-3f10d410846a/osd-block-0c6da4d7-3d99-4fc1-864f-64c9a31fcd13
      block uuid                A2eV0Q-WrUO-a4w8-ymgg-Ccob-CiEZ-XYAnis
      cephx lockbox secret
      cluster fsid              74ed6c1c-a3eb-11ed-9f3e-bfd1f90fb41c
      cluster name              ceph
      crush device class
      encrypted                 0
      osd fsid                  0c6da4d7-3d99-4fc1-864f-64c9a31fcd13
      osd id                    19
      osdspec affinity          cost_capacity
      type                      block
      vdo                       0
      devices                   /dev/sdl

====== osd.20 ======

  [block]       /dev/ceph-ea80ef75-69a2-45d5-af62-e285ce8d83a9/osd-block-d6754b00-fdcc-4c22-8ee8-eeccb4ae0ecc

      block device              /dev/ceph-ea80ef75-69a2-45d5-af62-e285ce8d83a9/osd-block-d6754b00-fdcc-4c22-8ee8-eeccb4ae0ecc
      block uuid                uxJaFM-7RTr-tYvV-yJqf-qIcp-q6bY-qBSdyk
      cephx lockbox secret
      cluster fsid              74ed6c1c-a3eb-11ed-9f3e-bfd1f90fb41c
      cluster name              ceph
      crush device class
      encrypted                 0
      osd fsid                  d6754b00-fdcc-4c22-8ee8-eeccb4ae0ecc
      osd id                    20
      osdspec affinity          cost_capacity
      type                      block
      vdo                       0
      devices                   /dev/sdm

====== osd.21 ======

  [block]       /dev/ceph-26731711-76c6-4c5e-b83e-87925494868c/osd-block-19d857a1-01f2-4b97-a60e-f980b4ec9dde

      block device              /dev/ceph-26731711-76c6-4c5e-b83e-87925494868c/osd-block-19d857a1-01f2-4b97-a60e-f980b4ec9dde
      block uuid                JdqE3L-nUsO-PRXc-1wnq-FG7N-qTNT-MBFcZ1
      cephx lockbox secret
      cluster fsid              74ed6c1c-a3eb-11ed-9f3e-bfd1f90fb41c
      cluster name              ceph
      crush device class
      encrypted                 0
      osd fsid                  19d857a1-01f2-4b97-a60e-f980b4ec9dde
      osd id                    21
      osdspec affinity          cost_capacity
      type                      block
      vdo                       0
      devices                   /dev/sdn

====== osd.22 ======

  [block]       /dev/ceph-135b609d-160c-47e6-827f-196b23c9c0e2/osd-block-f6ed6a44-bd86-47a2-a8a1-6855885d12b3

      block device              /dev/ceph-135b609d-160c-47e6-827f-196b23c9c0e2/osd-block-f6ed6a44-bd86-47a2-a8a1-6855885d12b3
      block uuid                FsTp5u-UqU9-8Pvd-ZbIi-4sB2-NOZ0-WJ5XXV
      cephx lockbox secret
      cluster fsid              74ed6c1c-a3eb-11ed-9f3e-bfd1f90fb41c
      cluster name              ceph
      crush device class
      encrypted                 0
      osd fsid                  f6ed6a44-bd86-47a2-a8a1-6855885d12b3
      osd id                    22
      osdspec affinity          cost_capacity
      type                      block
      vdo                       0
      devices                   /dev/sdo

====== osd.23 ======

  [block]       /dev/ceph-08d254a9-09cc-43c8-b61c-7642a6ff5d64/osd-block-d503ab8c-ae9c-4301-b55a-7d357f50e43c

      block device              /dev/ceph-08d254a9-09cc-43c8-b61c-7642a6ff5d64/osd-block-d503ab8c-ae9c-4301-b55a-7d357f50e43c
      block uuid                KowfZe-3n3b-Ftwd-Qnww-QLS4-Ec0a-E3KxhH
      cephx lockbox secret
      cluster fsid              74ed6c1c-a3eb-11ed-9f3e-bfd1f90fb41c
      cluster name              ceph
      crush device class
      encrypted                 0
      osd fsid                  d503ab8c-ae9c-4301-b55a-7d357f50e43c
      osd id                    23
      osdspec affinity          cost_capacity
      type                      block
      vdo                       0
      devices                   /dev/sdp

Any idea how to resolve the warning?

This is in a new Ceph Octopus cluster, aside from looking at the logs to try to find an error, I haven't done anything to attempt to resolve the issue yet.

LucasY
  • 64
  • 10
  • 1
    What is the output of `ceph health detail`? Apparently, cephadm tried to deploy some daemon (mon, osd, mds, crash, etc.) and failed. With the `detail` output you'll be able to see which daemon failed, then check on the affected node what happened. If you have all daemons deployed that you expected you can check for orphans with `cephadm ls`, if there's one that doesn't belong there, remove it with `cephadm rm-daemon --name `. You could also try to failover the active MGR: `ceph mgr fail`. By the way, Octopus is already EOL, I'd recommend to at least upgrade to Pacific. – eblock Mar 01 '23 at 07:34
  • Thanks for your guidance. I've modified the post to include the info you requested, I'm still not sure how to proceed since the osd daemon in question isn't included in cephadm ls. Octopus is already EOL? Ok, I guess I'll look into upgrade paths. That's discouraging. How long is 'life' for ceph? – LucasY Mar 02 '23 at 18:43
  • Actually, I just double checked, it looks like I'm on 16.2.11 (pacific), so I guess that much is ok. – LucasY Mar 02 '23 at 18:43
  • Please also add ceph osd tree and ceph osd find 23 to the question. Can you describe how exactly you removed the OSD? Did you do that with the orchestrator? ceph orch rm osd ID? – eblock Mar 02 '23 at 20:32
  • Updated again - I used the ceph dashboard to remove the OSD. To be specific, I believe I "purged" the OSD from the Cluter -> OSD menu. – LucasY Mar 02 '23 at 21:09
  • Okay, so it seems to have been removed from the crushmap, too. Either you'll have to run a `ceph mgr fail` to let a standby MGR take over, the warning might clear after that, or it could pop up again after a few minutes. On one of the two hosts you should see something in the `/var/log/ceph/cephadm.log` about OSD.23. Is there a free disk on that host that has not been wiped correctly? If your OSDs are deployed automatically (check `ceph orch ls osd --export` if you have something like "all available devices" set to true. If you don't want that to happen, you have to disable it... – eblock Mar 03 '23 at 08:27
  • `ceph orch apply osd --all-available-devices --unmanaged=true` would be the command to do that, check out the [docs](https://docs.ceph.com/en/quincy/cephadm/services/osd/#declarative-state). If you want that OSD to be deployed again, zap it: `cephadm ceph-volume zap --destroy /dev/sdX`or `ceph orch device zap --destroy :`. One more comment on your osd tree: you'll run into issues with that setup, I can almost guarantee it. Only two hosts means you'll probably have inactive PGs as soon as one host goes down (maintenance, reboot, power, etc.). – eblock Mar 03 '23 at 08:31
  • And because they have such differences in OSD size that one host will have many more PGs than the other one, also leading to problems. If this is only a test cluster to get familiar with ceph it's fine, you'll learn a lot. But if you plan to have actual (important) data in it, I recommend to read the docs to understand more what I meant so you can prevent data loss. – eblock Mar 03 '23 at 08:33
  • Ok, a few things. 1) I can't find any mention of osd.23 in the cephadm.log files since the 27th of Feburary when the osd failed. 2) I'm setup in --all-available-devices mode, but there are no unused devices on either of my hosts. 3) I'm aware of the silliness of my current cluster structure. This is a temporary cluster to replace an existing one (upgrading from mimic is hard), so as soon as I can decommission the old one I'll be able to balance things much more neatly. Do you still think I should ceph mgr fail? Is there not a way to clear the warning since nothing is currently failing? – LucasY Mar 06 '23 at 15:51
  • A failover of the mgr is a pretty harmless operation, so try that first if you haven't. There are many situations where a mgr can kind of "stall", a failover can clear that situation, so give that a try. So IIUC both hosts each have 11 OSDs, and one of them had 12? Then you removed OSD.23 and now you have the same amount of drives again? Or was OSD.23 part of the existing 11 drives? It's not entirely clear. Can you verify the deployed OSDs per host with `cephadm ceph-volume lvm list` and compare to `lsblk` output? It would help if you could write up what happened exactly, after mgr failover. – eblock Mar 06 '23 at 20:24
  • I think I understand the problem now - the disk is still physically in the system (/dev/sdp), and so ceph is trying to use it since --use-all-available is enabled. So I need to identify the failing disk and remove it from the system so that ceph doesn't try to create an osd out of it. I'll do a bit of research and let you know what I find. I've attached the relevant output to the question above in case you were curious. – LucasY Mar 08 '23 at 18:03
  • It’s not only in the system but it’s also discovered by cephadm because it has the lvm labels etc. That’s why cephadm tried to activate the disk. Not sure if it’s an issue with removing disks via dashboard, but if the disk also had been wiped then the all-available-devices would have recreated the OSD, but successfully. Since that seems not to be the case I assume the disk was not wiped and therefore stayed in the system as lvm which triggers cephadm to activate it, leading to the shown warning. The OSD auth caps probably have been deleted so it can’t be activated successfully. – eblock Mar 09 '23 at 17:00
  • Removing the drive from the system makes the most sense here if you want the all available devices setting. – eblock Mar 09 '23 at 17:01
  • @eblock, I can't thank you enough for your help in tracking down the root cause. You're a rockstar! – LucasY Mar 09 '23 at 17:12

1 Answers1

0

In my case, though certainly not in all cases, the issue was a failing disk that wasn't properly removed from the cluster. The language of "1 failed cephadm daemon(s)" was mostly misleading. The state of the cluster was as follows:

  1. The cluster was configured to allocate all available disks as OSDs
  2. An OSD disk had failed and the corresponding OSD had been removed through ceph dashboard
  3. the disk physically remained in the server

This caused ceph to attempt to reclaim the disk (I think) and the osd daemon would fail repeatedly since the disk was failed.

My resolution was:

  1. stop and disable the systemctl daemon for that osd (in my case osd.23)
  2. remove the /etc/systemd/system/ manifest for that daemon
  3. sudo systemctl daemon-reload && sudo systemctl reset-failed

At this point the docker ceph-osd image isn't restarting on repeat. But it still appeared as if the disk was mounted as an OSD. So I attempted to zap the disk with: 4) ceph-volume lvm zap --destroy /dev/sdX (where x is your device name)

But that failed. Device was busy so wipefs couldn't do it's thing.

Then I found this little gem: https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/NIS2PTFS2JEKHJIWN7VQ73MKTROZOQ6Y/

Which describes how to un-busy that device. Here are the steps I took from that thread:

  1. Check if lvs / pvs / vgs show some undefined device. In that case, you may have to flush the lvmeta cache:

    pvscan --cache
    Check again if zapping works after this, if not, continue. 
    
  2. Check major and minor number of the device:

    $ ls -la /dev/sdi brw-rw----. 1 root disk 66, 96 13. Jan 18:14 /dev/sdi

    In this case, it would be 66:96

  3. Check if device mapper still has it locked:

    $ dmsetup ls --tree

    ceph--1f6780e6--b120--4876--b674--aa3337847114-osd--block--1325f49b--fead--40ba--957e--ec6b2968d456 (253:1) └─ (66:96)

    => In this case, it is still mapped!

  4. Attempt to remove it:

    $ dmsetup remove ceph--1f6780e6--b120--4876--b674--aa3337847114-osd--block--1325f49b--fead--40ba--957e--ec6b2968d456

In my situation, after the dmsetup remove command I was able to zap the disk succesfully.

Then all that was left was to determine which disk in my chassis was /dev/sdp. To do this I found this command would make my busy light flash for long enough for me to identify the failed drive and physically remove it from the server:

dd if=/dev/sdp of=/dev/null

I hope this isn't helpful to any of you, because you never come across this problem. For the rest of you. I hope it's immensely helpful!

LucasY
  • 64
  • 10