0

I have a problem in a MongoDB Sharding deployment. I've got 4 replicasets, all of them with a very similar chunks. But the estimated data per chunk is very different in all of them.

mongos> db.hadCompressed.getShardDistribution()

Shard rsmmhad01 at rsmmhad01/mmhad01a:27017,mmhad01b:27017
 data : 578.74GiB docs : 933429549 chunks : 23812
 estimated data per chunk : 24.88MiB
 estimated docs per chunk : 39199

Shard rsmmhad02 at rsmmhad02/mmhad02a:27017,mmhad02b:27017
 data : 55.18GiB docs : 91659330 chunks : 23812
 estimated data per chunk : 2.37MiB
 estimated docs per chunk : 3849

Shard rsmmhad03 at rsmmhad03/mmhad03a:27017,mmhad03b:27017
 data : 218.62GiB docs : 368024030 chunks : 23814
 estimated data per chunk : 9.4MiB
 estimated docs per chunk : 15454

Shard rsmmhad04 at rsmmhad04/mmhad04a:27017,mmhad04b:27017
 data : 406.31GiB docs : 640265568 chunks : 23814
 estimated data per chunk : 17.47MiB
 estimated docs per chunk : 26886

Totals
 data : 1258.88GiB docs : 2033378477 chunks : 95252
 Shard rsmmhad01 contains 45.97% data, 45.9% docs in cluster, avg obj size on shard : 665B
 Shard rsmmhad02 contains 4.38% data, 4.5% docs in cluster, avg obj size on shard : 646B
 Shard rsmmhad03 contains 17.36% data, 18.09% docs in cluster, avg obj size on shard : 637B
 Shard rsmmhad04 contains 32.27% data, 31.48% docs in cluster, avg obj size on shard : 681B

The chunk size in the cluster is defined to 64Mb. And the key distribution is the next:

        db.had
                shard key: {
                    "chkin" : 1,
                    "n" : 1,
                    "occ" : 1,
                    "nid" : 1,
                    "rtype" : 1,
                    "gid" : 1,
                    "hid" : 1
                }
                unique: false
                balancing: true
                chunks:
                        rsmmhad01   23812
                        rsmmhad02   23812
                        rsmmhad03   23814
                        rsmmhad04   23814

I don't have too much jumbos in the database (no more than 10 o 12).

Could be useful to decrease the chunk size from 64Mb to 16Mb for example? Or maybe a change in the shard key should help?

Thanks in advance.

RbT
  • 1
  • 1
  • 1
    Based on your snippet of `sh.status()` output the chunks are balanced (by count) as expected. Have you deleted any significant historical data? If so, it is possible you have empty chunks which can be [merged](https://docs.mongodb.com/manual/tutorial/merge-chunks-in-sharded-cluster/) if they are unlikely to get new data. If you have a composite shard key with seven fields and are still getting jumbo chunks, this does suggest poor cardinality. What are the data types of the fields in your shard key? – Stennie Aug 26 '18 at 23:35
  • 1
    @Stennie Thanks for all! The problem was empty chunks. I've merged all of them and now all replicasets are balanced. – RbT Sep 07 '18 at 13:07

0 Answers0