2

We did mock test on the sharded mongo environment, but if the shard key value is small int, the collection didn't distribute, but if the shard key is big int, its working fine. Please read on...

Code used to insert records from mongos shell.

var shId = 15;
for (var i = 0; i < 100; i++) {
  if(i%50 == 0){
      shId = shId + 1;
  }
  db.Foo.insert( { shKeyId : shId , text:"this is a test" } );
}

With shId = 15, Foo collection split to two shards didn't work.

Environment : Two shards, each shard with Primary1 and two secondary mongod instances. Mongo config is running on one of the shard.

Sharding enabled on 'Foo' collection by shKeyId as hashed shard key. db.runCommand({ shardcollection : "test.Foo", key : {shKeyId : "hashed"}});

sh.status() output

mongos> sh.status();
--- Sharding Status --- 
  sharding version: {
    "_id" : 1,
    "version" : 3,
    "minCompatibleVersion" : 3,
    "currentVersion" : 4,
    "clusterId" : ObjectId("516ea48e979736fd306973c9")
}
  shards:
    {  "_id" : "mongo-perf-shrd1",  "host" : "mongo-perf-shrd1/sh1-prim-ip:27017,sh1-sec1-ip:27017,sh1-sec2-ip:27017" }
    {  "_id" : "mongo-perf-shrd2",  "host" : "mongo-perf-shrd2/sh2-prim-ip:27017,sh2-sec1-ip:27017,sh2-sec2-ip:27017" }
  databases:
    {  "_id" : "admin",  "partitioned" : false,  "primary" : "config" }

    {  "_id" : "test",  "partitioned" : true,  "primary" : "mongo-perf-shrd2" }
        test.Foo
            shard key: { "shKeyId" : "hashed" }
            chunks:
                mongo-perf-shrd2    2
                mongo-perf-shrd1    2
            { "shKeyId" : { "$minKey" : 1 } } -->> { "shKeyId" : NumberLong("-4611686018427387902") } on : mongo-perf-shrd2 { "t" : 2, "i" : 2 } 
            { "shKeyId" : NumberLong("-4611686018427387902") } -->> { "shKeyId" : NumberLong(0) } on : mongo-perf-shrd2 { "t" : 2, "i" : 3 } 
            { "shKeyId" : NumberLong(0) } -->> { "shKeyId" : NumberLong("4611686018427387902") } on : mongo-perf-shrd1 { "t" : 2, "i" : 4 } 
            { "shKeyId" : NumberLong("4611686018427387902") } -->> { "shKeyId" : { "$maxKey" : 1 } } on : mongo-perf-shrd1 { "t" : 2, "i" : 5 } 

Shard Distribution output

   mongos> db.Foo.getShardDistribution();

Shard mongo-perf-shrd1 at mongo-perf-shrd1/ip1:27017,ip2,ip3:27017
 data : 6KiB docs : 100 chunks : 2
 estimated data per chunk : 3KiB
 estimated docs per chunk : 50

Shard mongo-perf-shrd2 at mongo-perf-shrd2/ip4:27017,ip5:27017,ip6:27017
 data : 0B docs : 0 chunks : 2
 estimated data per chunk : 0B
 estimated docs per chunk : 0

Totals
 data : 6KiB docs : 100 chunks : 4
 Shard mongo-perf-shrd1 contains 100% data, 100% docs in cluster, avg obj size on shard : 64B
 Shard mongo-perf-shrd2 contains 0% data, 0% docs in cluster, avg obj size on shard : NaNGiB
Samba
  • 607
  • 4
  • 19
  • Can you describe what you mean by "the sharding didn't work"? – shelman Apr 17 '13 at 16:08
  • I was expecting, 50 records go into shard1, and other 50 records go into shard2. But all 100 records saved into shard1 – Samba Apr 17 '13 at 16:52
  • Could you post the output of `sh.status()` run on the mongos? – shelman Apr 17 '13 at 19:05
  • Thanks shelman. I have added sh.status output into the question – Samba Apr 18 '13 at 00:09
  • Why do you think all 100 records are saved in shard1? It looks like, given the output, that the chunks are evenly spread amongst the two shards (2 chunks in each shard) – shelman Apr 18 '13 at 15:43
  • db.Foo.getShardDistribution shows that 100 documents in mongo-perf-shrd1 ( I have added the output in questions) – Samba Apr 18 '13 at 20:57
  • Here the size of the data is also important. `The default chunk size for a sharded cluster is 64 megabytes.` As per above code the document size if very small. So as the distribution shows, all the data that you have inserted is residing in Chunk. If the size of the chunk exceeds the set limit then it will break and some portion will be migrated to other shard / chunk. – Albatross Sep 26 '19 at 17:20

1 Answers1

0

you have just monotonically shard key, this is not good solution: http://docs.mongodb.org/manual/core/sharded-cluster-internals/

for just mock tested, enough to select shard key by default _id field

  • 2
    http://docs.mongodb.org/manual/core/sharded-clusters/#shard-keys "Hashed keys work well with fields that increase monotonically" – Samba Apr 17 '13 at 11:17
  • **1** The important factor to be considered here is also `query pattern`. I don't see any problem with monotonically increasing shard key. If the ranges are evenly distributed which can be achieved by using a process called [pre-sharding](https://docs.mongodb.com/manual/tutorial/create-chunks-in-sharded-cluster). If the possible range is known beforehand, using python ( or any other language) we can calculate no. of shards beforehand and assign key ranges to shards taking care. This is useful when we have range queries `from id1 to idN` – Albatross Sep 26 '19 at 17:27
  • **2** Hashed sharding will lead to even distribution for sure but here it also important to mention `initial Chunks` to reduce load of chunk migration. But this is not useful if the queries involve ranges. e.g. id1 will be part of shard 1 and id 2 may be part of shard n and fetching this will be expensive operation. – Albatross Sep 26 '19 at 17:29