1

I am migrating the Filestore backend on my tiny Ceph cluster (CentOS 7) to Bluestore (on Luminous). I am following the instructions (http://docs.ceph.com/docs/master/rados/operations/bluestore-migration/).

For the last one hour, the PG re-placement is stuck on 33% and its not budging. Logs are as follows:

master@ceph/ ceph osd out 2
osd.2 is already out. 
master@ceph/ ceph -w
  cluster:
    id:     e6104db4-284f-4a13-8128-570e3427a9f9
    health: HEALTH_WARN
            147718/443154 objects misplaced (33.333%)

  services:
    mon: 3 daemons, quorum node2,node3,master
    mgr: master(active), standbys: node3, node2
    mds: cephfs-1/1/1 up  {0=master=up:active}
    osd: 5 osds: 5 up, 4 in; 264 remapped pgs

  data:
    pools:   3 pools, 264 pgs
    objects: 144k objects, 8245 MB
    usage:   51972 MB used, 836 GB / 886 GB avail
    pgs:     147718/443154 objects misplaced (33.333%)
             264 active+clean+remapped

  io:
    client:   1695 B/s wr, 0 op/s rd, 0 op/s wr


2018-06-07 05:04:18.210576 mon.node2 [WRN] Health check update: 147717/443151 objects misplaced (33.333%) (OBJECT_MISPLACED)
2018-06-07 05:04:44.258809 mon.node2 [WRN] Health check update: 147718/443154 objects misplaced (33.333%) (OBJECT_MISPLACED)
2018-06-07 05:05:18.438887 mon.node2 [WRN] Health check update: 147717/443151 objects misplaced (33.333%) (OBJECT_MISPLACED)
2018-06-07 05:05:44.571445 mon.node2 [WRN] Health check update: 147718/443154 objects misplaced (33.333%) (OBJECT_MISPLACED)
2018-06-07 05:06:18.754717 mon.node2 [WRN] Health check update: 147717/443151 objects misplaced (33.333%) (OBJECT_MISPLACED)
2018-06-07 05:06:44.887698 mon.node2 [WRN] Health check update: 147718/443154 objects misplaced (33.333%) (OBJECT_MISPLACED)
0xF2
  • 314
  • 3
  • 17
everCurious
  • 195
  • 3
  • 15

1 Answers1

0

Try this:

  1. Bring in the osd back (ceph osd in 2 ). this will bring your cluster back to healthy state.
  2. Ensure there are no srub process in progress.
  3. set noscrub and nodeep-srub flags to avoid additional load on the server.
  4. set osd 2 out again.

Check the OSD and ceph logs for more details if it hangs again.

David Buck
  • 3,752
  • 35
  • 31
  • 35