3

I've been trying to improve our ceph recovery speed and every option I've come across in ceph documentation and on various forums seems to have no effect.

I've tried setting a combination of options found online at this point with no change in recovery speed. Current settings are set via:

for i in $(sudo ceph osd ls)
do
  sudo ceph tell osd.$i injectargs --osd-max-backfills=7 \
                                   --osd-recovery-max-active=50 \
                                   --osd-recovery-op-priority=100 \
                                   --osd-recovery-max-active-hdd=50 \
                                   --osd-client-op-priority=3
done

No matter what recovery stays around:

  io:
    client:   857 MiB/s rd, 357 MiB/s wr, 748 op/s rd, 745 op/s wr
    recovery: 53 MiB/s, 16 objects/s

Any help on how to get ceph to recover faster would be greatly appreciated.

Ceph Version: 15.2.9
OS: Ubuntu 20.04
Storage Type: SATA HDDs Network: 2x10Gbps each node in LACP Teaming mode Number of Nodes: 15
Disks per Node: 5 Nodes with 90x10TB Disks and 10 Nodes with 60x14TB Disks

  • What is the network bandwidth between the OSD nodes? – 0xF2 May 08 '21 at 17:16
  • 1
    Your client IO looks quite high. Take a look at the disk saturation (e.g. with iostat), I assume the disks are at 100% utilization already, so you can’t increase recovery speed. – eblock May 08 '21 at 21:06
  • 1
    @0xF2 I updated my question with more details regarding your question. – user2074364 May 10 '21 at 00:28
  • @eblock looking at iostat I see quite a bit of variance in tps on the disks in the OSD ranging from near 0 to a few in ~100. However iowait is rather low on all the nodes around 3-4% iowait and 92-93% idle – user2074364 May 10 '21 at 00:31
  • 1
    How far did you push the --osd-max-backfills setting? You could try to increase it a couple of times and wait how it's going before the next increment. I'd try to get to 48 or so, if client IO is not affected too much. Just to get the OSDs to near 100% utilization. – eblock May 10 '21 at 06:36
  • Thanks. That was it. Apparently I was still being too conservative with my max-backfills. Once I got around 10-15 recovery sped up. – user2074364 May 13 '21 at 01:06

1 Answers1

1

you need to set osd_recovery_sleep_hdd to 0

sudo ceph tell osd.* injectargs --osd_recovery_sleep_hdd=0
sudo ceph tell 'osd.*' injectargs --osd-max-backfills=300 --osd-recovery-max-active=900

to check recovery run below cmd:

ceph pg stat
Hackaholic
  • 19,069
  • 5
  • 54
  • 72