pg_num autoscaling on Ceph

Question

I am new to Ceph, and have only used it in my homelab with 3 nodes of 2 osd's each. After reading about Nautilus and pg_num autoscaling I enabled this, but this was probably a mistake. now my cluster have this status. Someone have tip on how to get past this?

Ceph status
  cluster:
    id:     b512a8d7-1956-4ef3-aa3e-6f24d08878cf
    health: HEALTH_WARN
            Reduced data availability: 256 pgs inactive

  services:
    mon: 3 daemons, quorum ce01,ce03,ce02 (age 17m)
    mgr: ce02(active, since 48m), standbys: ce03, ce01
    mds: cephfs:1 {0=ce03=up:active} 2 up:standby
    osd: 6 osds: 6 up (since 17m), 6 in (since 5d)

  data:
    pools:   3 pools, 288 pgs
    objects: 24 objects, 4.8 MiB
    usage:   683 GiB used, 16 TiB / 16 TiB avail
    pgs:     88.889% pgs unknown
             256 unknown
             32  active+clean

Welcome to stackoverflow. The provided information is a little sparse. Unkown PGs can exist because the monitor is not able to talk with the OSDs. Could it be a network problem? Please check the ceph-mon and osds logs. They should provide more insight in your problem. — itsafire, Sep 26 '19 at 09:54
I had 288 pgs over 3 pools 128, 128 and 32 and the cluster was working just fine. Then I turned on autoscaling of pgs, and it suggestet to set the pg number down to 8 16 8. When I then enabled autoscaling, this status came and I am not able to access the RBD pool anymore. Have done no work on the network, so that should not be the problem. — raymondbh, Sep 26 '19 at 11:46
Another strange thing is that when I use the pool get pg_num command it shows the old pg number (128) and not the new one that autoscale made for it... — raymondbh, Sep 26 '19 at 11:54
Seems like this autoscaling destroyed my data in those pools :( — raymondbh, Sep 29 '19 at 17:00
Without any further insight (logs, etc.) no-one will be able to inspect this problem any further. You might also try to reach out to the Ceph community. Their IRC channel is quite helpful. — itsafire, Sep 29 '19 at 17:17
Do you know what's needed of information so I can try to provide it? — raymondbh, Sep 30 '19 at 12:14

pg_num autoscaling on Ceph

0 Answers0