Check if you can help me.
We have an old BeeGFS install running version 7.1.5 on EL7 and one of the TargetIDs gone offline (without replacing). After it came back buddy mirror entered in a failed state that we can’t recover.
If we try to change the Target back to online it fails:
[root@headnode beegfs]# beegfs-ctl --nodetype=storage --setstate --state=good --force --targetid=13
Node did not accept state change. Error: Unknown storage target
The state shows as this:
root@headnode ~]# beegfs-ctl --listtargets --nodetype=storage --state
TargetID Reachability Consistency NodeID ======== ============ =========== ====== 1 Online Good 1 2 Online Good 2 3 Online Good 3 4 Online Good 4 5 Online Good 5 6 Online Good 6 7 Online Good 7 8 Online Good 8 9 Online Good 9 10 Online Good 10 11 Online Good 11 12 Online Good 12 13 Offline Good 13 14 Online Good 14 16 Online Good 13 Please note that a new TargetID numbered as 16 appeared where it should be 13.I tried to swap it back to 13 but I was unable to.
[root@headnode.mintrop.usp.br ~]# beegfs-ctl --removetarget 13
Given target is part of a buddy mirror group. Aborting.
[root@n13 ~]# beegfs-ctl --removemirrorgroup --mirrorgroupid=7 --nodetype=storage --dry-run
Could not remove buddy group: Communication error
I think we are doing something wrong, because of the buddy mirror setup that sometimes is difficult.
Any help is greatly appreciated. Thank you.
PS: For completude, the checks seems to be fine:
[root@headnode.mintrop.usp.br ~]# beegfs-df
METADATA SERVERS: TargetID Cap. Pool Total Free % ITotal IFree % ======== ========= ===== ==== = ====== ===== = 1 normal 218.2GiB 66.9GiB 31% 109.2M 107.8M 99%
STORAGE TARGETS: TargetID Cap. Pool Total Free % ITotal IFree % ======== ========= ===== ==== = ====== ===== =
[ERROR from beegfs-storage n13.mintrop.usp.br [ID: 13]: Unknown storage target] 13 emergency 0.0GiB 0.0GiB 0% 0.0M 0.0M 0%