Mongo shard collection not distributed if shard key is a small int

Question

We did mock test on the sharded mongo environment, but if the shard key value is small int, the collection didn't distribute, but if the shard key is big int, its working fine. Please read on...

Code used to insert records from mongos shell.

var shId = 15;
for (var i = 0; i < 100; i++) {
  if(i%50 == 0){
      shId = shId + 1;
  }
  db.Foo.insert( { shKeyId : shId , text:"this is a test" } );
}

With shId = 15, Foo collection split to two shards didn't work.

Environment : Two shards, each shard with Primary1 and two secondary mongod instances. Mongo config is running on one of the shard.

Sharding enabled on 'Foo' collection by shKeyId as hashed shard key. db.runCommand({ shardcollection : "test.Foo", key : {shKeyId : "hashed"}});

sh.status() output

mongos> sh.status();
--- Sharding Status --- 
  sharding version: {
    "_id" : 1,
    "version" : 3,
    "minCompatibleVersion" : 3,
    "currentVersion" : 4,
    "clusterId" : ObjectId("516ea48e979736fd306973c9")
}
  shards:
    {  "_id" : "mongo-perf-shrd1",  "host" : "mongo-perf-shrd1/sh1-prim-ip:27017,sh1-sec1-ip:27017,sh1-sec2-ip:27017" }
    {  "_id" : "mongo-perf-shrd2",  "host" : "mongo-perf-shrd2/sh2-prim-ip:27017,sh2-sec1-ip:27017,sh2-sec2-ip:27017" }
  databases:
    {  "_id" : "admin",  "partitioned" : false,  "primary" : "config" }

    {  "_id" : "test",  "partitioned" : true,  "primary" : "mongo-perf-shrd2" }
        test.Foo
            shard key: { "shKeyId" : "hashed" }
            chunks:
                mongo-perf-shrd2    2
                mongo-perf-shrd1    2
            { "shKeyId" : { "$minKey" : 1 } } -->> { "shKeyId" : NumberLong("-4611686018427387902") } on : mongo-perf-shrd2 { "t" : 2, "i" : 2 } 
            { "shKeyId" : NumberLong("-4611686018427387902") } -->> { "shKeyId" : NumberLong(0) } on : mongo-perf-shrd2 { "t" : 2, "i" : 3 } 
            { "shKeyId" : NumberLong(0) } -->> { "shKeyId" : NumberLong("4611686018427387902") } on : mongo-perf-shrd1 { "t" : 2, "i" : 4 } 
            { "shKeyId" : NumberLong("4611686018427387902") } -->> { "shKeyId" : { "$maxKey" : 1 } } on : mongo-perf-shrd1 { "t" : 2, "i" : 5 }

Shard Distribution output

   mongos> db.Foo.getShardDistribution();

Shard mongo-perf-shrd1 at mongo-perf-shrd1/ip1:27017,ip2,ip3:27017
 data : 6KiB docs : 100 chunks : 2
 estimated data per chunk : 3KiB
 estimated docs per chunk : 50

Shard mongo-perf-shrd2 at mongo-perf-shrd2/ip4:27017,ip5:27017,ip6:27017
 data : 0B docs : 0 chunks : 2
 estimated data per chunk : 0B
 estimated docs per chunk : 0

Totals
 data : 6KiB docs : 100 chunks : 4
 Shard mongo-perf-shrd1 contains 100% data, 100% docs in cluster, avg obj size on shard : 64B
 Shard mongo-perf-shrd2 contains 0% data, 0% docs in cluster, avg obj size on shard : NaNGiB

Can you describe what you mean by "the sharding didn't work"? — shelman, Apr 17 '13 at 16:08
I was expecting, 50 records go into shard1, and other 50 records go into shard2. But all 100 records saved into shard1 — Samba, Apr 17 '13 at 16:52
Could you post the output of `sh.status()` run on the mongos? — shelman, Apr 17 '13 at 19:05
Thanks shelman. I have added sh.status output into the question — Samba, Apr 18 '13 at 00:09
Why do you think all 100 records are saved in shard1? It looks like, given the output, that the chunks are evenly spread amongst the two shards (2 chunks in each shard) — shelman, Apr 18 '13 at 15:43
db.Foo.getShardDistribution shows that 100 documents in mongo-perf-shrd1 ( I have added the output in questions) — Samba, Apr 18 '13 at 20:57
Here the size of the data is also important. `The default chunk size for a sharded cluster is 64 megabytes.` As per above code the document size if very small. So as the distribution shows, all the data that you have inserted is residing in Chunk. If the size of the chunk exceeds the set limit then it will break and some portion will be migrated to other shard / chunk. — Albatross, Sep 26 '19 at 17:20

score 0 · Answer 1 · answered Apr 17 '13 at 10:13

0

you have just monotonically shard key, this is not good solution: http://docs.mongodb.org/manual/core/sharded-cluster-internals/

for just mock tested, enough to select shard key by default _id field

answered Apr 17 '13 at 10:13

Vladimir Muzhilov

746
6
17

2

http://docs.mongodb.org/manual/core/sharded-clusters/#shard-keys "Hashed keys work well with fields that increase monotonically" – Samba Apr 17 '13 at 11:17
**1** The important factor to be considered here is also `query pattern`. I don't see any problem with monotonically increasing shard key. If the ranges are evenly distributed which can be achieved by using a process called [pre-sharding](https://docs.mongodb.com/manual/tutorial/create-chunks-in-sharded-cluster). If the possible range is known beforehand, using python ( or any other language) we can calculate no. of shards beforehand and assign key ranges to shards taking care. This is useful when we have range queries `from id1 to idN` – Albatross Sep 26 '19 at 17:27
**2** Hashed sharding will lead to even distribution for sure but here it also important to mention `initial Chunks` to reduce load of chunk migration. But this is not useful if the queries involve ranges. e.g. id1 will be part of shard 1 and id 2 may be part of shard n and fetching this will be expensive operation. – Albatross Sep 26 '19 at 17:29

Mongo shard collection not distributed if shard key is a small int

1 Answers1