0

I am unsure if this is the platform to ask. But hopefully it is :).

I've got a 3 node setup of ceph.

node1
mds.node1 , mgr.node1 , mon.node1 , osd.0 , osd.1 , osd.6
14.2.22
node2
mds.node2 , mon.node2 , osd.2 , osd.3 , osd.7
14.2.22
node3
mds.node3 , mon.node3 , osd.4 , osd.5 , osd.8
14.2.22

For some reason though, When I down one node, It does not start backfilling/recovery at all. It just reports 3 osd's down as below. But does nothing to repair it....

If I run a ceph -s I get the below ouput:

[root@node1 testdir]# ceph -s
  cluster:
    id:     8932b76b-282b-4385-bee8-5c295af88e74
    health: HEALTH_WARN
            3 osds down
            1 host (3 osds) down
            Degraded data redundancy: 30089/90267 objects degraded (33.333%), 200 pgs degraded, 512 pgs undersized
            1/3 mons down, quorum node1,node2

  services:
    mon: 3 daemons, quorum node1,node2 (age 2m), out of quorum: node3
    mgr: node1(active, since 48m)
    mds: homeFS:1 {0=node1=up:active} 1 up:standby-replay
    osd: 9 osds: 6 up (since 2m), 9 in (since 91m)

  data:
    pools:   4 pools, 512 pgs
    objects: 30.09k objects, 144 MiB
    usage:   14 GiB used, 346 GiB / 360 GiB avail
    pgs:     30089/90267 objects degraded (33.333%)
             312 active+undersized
             200 active+undersized+degraded

  io:
    client:   852 B/s rd, 2 op/s rd, 0 op/s wr

[root@node1 testdir]#

The odd thing though, when I boot up my 3rd node again it does recover and sync. But it looks like it's backfilling just not starting at all... Is there something that might be causing it?

Update What I did notice, If I mark a drive as out, it does recover it... But when the server node's down, and the drive's marked as out, it then does not recover it at all...

Update 2: I noticed while experimenting that if the OSD is up, but out, It does recover... When the OSD is marked as down it does not begin to recover at all...

Marcel
  • 874
  • 1
  • 14
  • 28
  • If it helps anyone. I found that ceph by default set's the replication to 3... Which is fine, but if a bit of data already exists on a server it won't replicate another which means it won't heal with replication 3 if there are 3 servers. But with replication 2 it does heal perfectly well. So there was no fault :) – Marcel Oct 05 '21 at 14:43

1 Answers1

0

The default is 10 minutes for ceph to wait until it marks OSDs as out (mon_osd_down_out_interval). This can help in case a server just needs a reboot and returns within 10 minutes then all is good. If you need a longer maintenance window but you're not sure if it will be longer than 10 minutes, but the server will eventually return, set ceph osd set noout to prevent unnecessary rebalancing.

eblock
  • 579
  • 3
  • 5