I have a pool of 5 Gluster servers with individual bricks for a volume operating in Dispersed mode, to which I have added an extra 5 peers with individual bricks in another datacenter, making this volume mode become "Distributed-Dispersed" with a brick formula of 2 x (3 + 2) = 10.
After fully rebalancing the cluster of 10 peers, I noticed during some tests that some files went missing on the first pool (let's call it pool-1) when all 5 peers in pool-2 were disconnected from the cluster. To my understanding this should not be happening, as each separate pool should have its own full set of data in dispersed format. If I'm wrong, please correct me!
Something I noticed during the initial rebalancing (which I am hypothesizing is related, but do not have the Gluster expertise to prove) is that node #4 of pool #2 enters the "completed" stage of rebalancing in a matter of seconds, although each other node requires more than 24 hours to complete even the scanning portion. This node also lists exactly 2 "scanned" files, with none rebalanced, skipped, or failed:
Node Rebalanced-files size scanned failures skipped status run time in h:m:s
--------- ----------- ----------- ----------- ----------- ----------- ------------ --------------
localhost 159 231.4MB 269931 0 0 in progress 3:10:26
pool-1-2 0 0Bytes 0 0 0 in progress 3:10:26
pool-1-3 0 0Bytes 0 0 0 in progress 3:10:25
pool-1-4 0 0Bytes 0 0 0 in progress 3:10:26
pool-1-5 0 0Bytes 0 0 0 in progress 3:10:26
pool-2-1 0 0Bytes 0 0 0 in progress 3:10:26
pool-2-2 0 0Bytes 0 0 0 in progress 3:10:26
pool-2-3 0 0Bytes 0 0 0 in progress 3:10:26
pool-2-4 0 0Bytes 2 0 0 completed 0:00:18
pool-2-5 0 0Bytes 0 0 0 in progress 3:10:26
Estimated time left for rebalance to complete : 15:08:05
volume rebalance: dev-volume: success
Drilling into the rebalance logs on pool-2-4, I found the following interesting messages:
[2020-08-20 21:24:20.623006] I [MSGID: 109081] [dht-common.c:4209:dht_setxattr] 0-dev-volume-dht: fixing the layout of /
...
[2020-08-20 21:24:29.720716] I [MSGID: 0] [dht-rebalance.c:3737:gf_defrag_total_file_cnt] 0-dev-volume-dht: Total number of files = 1684196
[2020-08-20 21:24:29.720725] E [MSGID: 0] [dht-rebalance.c:3900:gf_defrag_start_crawl] 0-dev-volume-dht: Failed to get the total number of files. Unable to estimate time to complete rebalance.
...
[2020-08-20 21:24:29.725724] I [dht-rebalance.c:2745:gf_defrag_process_dir] 0-dev-volume-dht: migrate data called on /
[2020-08-20 21:24:29.725828] W [dict.c:416:dict_set] (-->/usr/lib64/glusterfs/3.10.1/xlator/cluster/distribute.so(+0x42f51) [0x7fed71172f51] -->/lib64/libglusterfs.so.0(dict_set_int32+0x2b) [0x7fed78af14eb] -->/lib64/libglusterfs.so.0(dict_set+0xe6) [0x7fed78aefc56] ) 0-dict: !this || !value for key=readdir-filter-directories [Invalid argument]
[2020-08-20 21:24:29.725845] E [MSGID: 109003] [dht-common.c:4917:dht_opendir] 0-dev-volume-dht: Failed to set dictionary value :key = readdir-filter-directories, ret:-1
[2020-08-20 21:24:32.718807] I [dht-rebalance.c:2959:gf_defrag_process_dir] 0-dev-volume-dht: Migration operation on dir / took 2.99 secs
[2020-08-20 21:24:32.718898] W [dict.c:416:dict_set] (-->/usr/lib64/glusterfs/3.10.1/xlator/cluster/distribute.so(+0x42f51) [0x7fed71172f51] -->/lib64/libglusterfs.so.0(dict_set_int32+0x2b) [0x7fed78af14eb] -->/lib64/libglusterfs.so.0(dict_set+0xe6) [0x7fed78aefc56] ) 0-dict: !this || !value for key=readdir-filter-directories [Invalid argument]
[2020-08-20 21:24:32.723301] I [dht-rebalance.c:3994:gf_defrag_start_crawl] 0-DHT: crawling file-system completed
...
[2020-08-20 21:24:32.723730] I [MSGID: 109028] [dht-rebalance.c:4277:gf_defrag_status_get] 0-dev-volume-dht: Files migrated: 0, size: 0, lookups: 2, failures: 0, skipped: 0
[2020-08-20 21:24:32.723894] W [glusterfsd.c:1329:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7dc5) [0x7fed77958dc5] -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x556351afaf85] -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x556351afadfb] ) 0-: received signum (15), shutting down
Each of my other nodes begins with a "total number of files" equal to 0, and each file in each subfolder can clearly be seen rebalancing with a message:
[2020-08-12 19:56:49.614327] I [dht-rebalance.c:2745:gf_defrag_process_dir] 0-dev-volume-dht: migrate data called on /data/jobs
[2020-08-12 19:56:49.820702] I [MSGID: 109081] [dht-common.c:4209:dht_setxattr] 0-dev-volume-dht: fixing the layout of /data/jobs/201501
[2020-08-12 19:56:50.294380] I [dht-rebalance.c:2745:gf_defrag_process_dir] 0-dev-volume-dht: migrate data called on /data/jobs/201501
[2020-08-12 19:56:50.518000] I [MSGID: 109081] [dht-common.c:4209:dht_setxattr] 0-dev-volume-dht: fixing the layout of /data/jobs/201501/00
[2020-08-12 19:56:50.863319] I [dht-rebalance.c:2745:gf_defrag_process_dir] 0-dev-volume-dht: migrate data called on /data/jobs/201501/00
[2020-08-12 19:56:51.116676] I [MSGID: 109081] [dht-common.c:4209:dht_setxattr] 0-dev-volume-dht: fixing the layout of /data/jobs/201501/02
I don't get any of the !value for key=readdir-filter-directories [Invalid argument]
messages in any other node either.
If I check the sum of the size of all the files inside the gluster mount's data directory (dispersed so not a full representation of the data), I can see that it clearly has a significant amount of stuff:
[root@pool-2-4 dev-volume]# du -csh *
8.0K backups
158G data
25M etc
8.0K lost+found
38G static
783M bin
196G total
Could the errors I'm seeing in the rebalance log be indicative of the problem of pool 1 having missing files when pool 2 is taken offline? Could it be an entirely separate problem? Is my entire understanding of this incorrect?
I apologize for the slight vagueness of this question, and offer my gratitude to anyone who can offer some insight.