1

We use a CEPH cluster to store images of our virtual machines. This cluster contains 3 monitors, 4 storage nodes and 1 admin.

CEPH OSD TREE

ID WEIGHT   TYPE NAME       UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 21.82190 root default
-2  5.45547     host ceph01
 0  1.09109         osd.0        up  1.00000          1.00000
 1  1.09109         osd.1        up  1.00000          1.00000
 2  1.09109         osd.2        up  1.00000          1.00000
 3  1.09109         osd.3        up  1.00000          1.00000
 4  1.09109         osd.4        up  1.00000          1.00000
-3  5.45547     host ceph02
 5  1.09109         osd.5        up  1.00000          1.00000
 6  1.09109         osd.6        up  1.00000          1.00000
 7  1.09109         osd.7        up  1.00000          1.00000
 8  1.09109         osd.8        up  1.00000          1.00000
 9  1.09109         osd.9        up  1.00000          1.00000
-4  5.45547     host ceph03
10  1.09109         osd.10       up  1.00000          1.00000
11  1.09109         osd.11       up  1.00000          1.00000
12  1.09109         osd.12       up  1.00000          1.00000
13  1.09109         osd.13       up  1.00000          1.00000
14  1.09109         osd.14       up  1.00000          1.00000
-5  5.45547     host ceph04
16  1.09109         osd.16     down        0          1.00000
17  1.09109         osd.17     down        0          1.00000
18  1.09109         osd.18     down        0          1.00000
19  1.09109         osd.19     down        0          1.00000
15  1.09109         osd.15     down        0          1.00000

First, since the last CentOs update, we can't synchronize our 4th server. On the other servers, there were no problems after the update. We tried to sync with:

  • nodown option
  • Running VMs
  • Stopped VMs
  • Change HDD
  • Change HDD slot

Does anyone have an idea or a lead for resynchronizing it?

Actually, we're considering a fresh installation of CentOS for server ceph04.

Secondly, we want to update the cluster. Is it possible to do this without disrupting the use of the cluster (with the VMs on)?

More infos

  • OS: CentOS Linux release 7.7.1908 (Core)
  • CEPH version: 10.2.11
  • Some machines have a large disk access.
Keny
  • 21
  • 3

1 Answers1

1

Secondly, we want to update the cluster. Is it possible to do this without disrupting the use of the cluster (with the VMs on)?

Yes, it's possible. But we finally decided to turn off all VMs to speed up the synchronization. The update does this relatively quickly

Does anyone have an idea or a lead for resynchronizing it?

It was a problem with iptable. After the iptable rules were changed, the synchronization started again.

Keny
  • 21
  • 3