I have a problem in a MongoDB Sharding deployment. I've got 4 replicasets, all of them with a very similar chunks. But the estimated data per chunk is very different in all of them.
mongos> db.hadCompressed.getShardDistribution()
Shard rsmmhad01 at rsmmhad01/mmhad01a:27017,mmhad01b:27017
data : 578.74GiB docs : 933429549 chunks : 23812
estimated data per chunk : 24.88MiB
estimated docs per chunk : 39199
Shard rsmmhad02 at rsmmhad02/mmhad02a:27017,mmhad02b:27017
data : 55.18GiB docs : 91659330 chunks : 23812
estimated data per chunk : 2.37MiB
estimated docs per chunk : 3849
Shard rsmmhad03 at rsmmhad03/mmhad03a:27017,mmhad03b:27017
data : 218.62GiB docs : 368024030 chunks : 23814
estimated data per chunk : 9.4MiB
estimated docs per chunk : 15454
Shard rsmmhad04 at rsmmhad04/mmhad04a:27017,mmhad04b:27017
data : 406.31GiB docs : 640265568 chunks : 23814
estimated data per chunk : 17.47MiB
estimated docs per chunk : 26886
Totals
data : 1258.88GiB docs : 2033378477 chunks : 95252
Shard rsmmhad01 contains 45.97% data, 45.9% docs in cluster, avg obj size on shard : 665B
Shard rsmmhad02 contains 4.38% data, 4.5% docs in cluster, avg obj size on shard : 646B
Shard rsmmhad03 contains 17.36% data, 18.09% docs in cluster, avg obj size on shard : 637B
Shard rsmmhad04 contains 32.27% data, 31.48% docs in cluster, avg obj size on shard : 681B
The chunk size in the cluster is defined to 64Mb. And the key distribution is the next:
db.had
shard key: {
"chkin" : 1,
"n" : 1,
"occ" : 1,
"nid" : 1,
"rtype" : 1,
"gid" : 1,
"hid" : 1
}
unique: false
balancing: true
chunks:
rsmmhad01 23812
rsmmhad02 23812
rsmmhad03 23814
rsmmhad04 23814
I don't have too much jumbos in the database (no more than 10 o 12).
Could be useful to decrease the chunk size from 64Mb to 16Mb for example? Or maybe a change in the shard key should help?
Thanks in advance.