0

I understand that 'Hash Sharding' can be done on the collection level on a database on based on the key of the collection that is passed.

This ensures that records for that collection are distributed across all the shards.

I understand what happens with one collection. What about the other collections?

  • Does all the data of all the other tables get stored in one shard only?
  • Does it get replicated across all the shards?
  • Does it also get split and spread across all the shards?
codeHead
  • 69
  • 9
  • This question belongs on [dba.stackexchange.com](http://dba.stackexchange.com) . StackOverflow is for programming related topics only, such as solving "code" problems. Other questions can be posted to the other stackexchange network sites. – Neil Lunn Sep 17 '14 at 12:02

1 Answers1

2

The other collections will reside on a single shard (known as the primary shard) unless you decide to shard them also. The primary shard is set at the database level rather than collection, so all non-sharded collections in a particular database will all have the same primary shard. You can see the primary for any given database in the sh.status() output, as per the example below:

mongos> sh.status()
--- Sharding Status --- 
  sharding version: {
    "_id" : 1,
    "version" : 4,
    "minCompatibleVersion" : 4,
    "currentVersion" : 5,
    "clusterId" : ObjectId("54185b2c2a2835b6e47f7984")
}
  shards:
    {  "_id" : "shard0000",  "host" : "localhost:30000" }
  databases:
    {  "_id" : "admin",  "partitioned" : false,  "primary" : "config" }
    {  "_id" : "shardTest",  "partitioned" : true,  "primary" : "shard0000" }
        shardTest.foo
            shard key: { "_id" : 1 }
            chunks:
                shard0000   1
            { "_id" : { "$minKey" : 1 } } -->> { "_id" : { "$maxKey" : 1 } } on : shard0000 Timestamp(1, 0) 
    {  "_id" : "bar",  "partitioned" : true,  "primary" : "shard0000" }
        bar.data
            shard key: { "_id" : 1 }
            chunks:
                shard0000   1
            { "_id" : { "$minKey" : 1 } } -->> { "_id" : { "$maxKey" : 1 } } on : shard0000 Timestamp(1, 0) 
    {  "_id" : "foo",  "partitioned" : true,  "primary" : "shard0000" }
        foo.data
            shard key: { "_id" : 1 }
            chunks:
                shard0000   9

In this example there is only one shard (shard0000), and hence it is the primary for all the databases ("primary" : "shard0000") except config which is a special case (and resides on the config servers). The primary shard for a database is chosen when the database is created.

Hence, if you only had one shard, created all your databases first and then added more shards later, all the databases you created before adding new shards will have their primary set to that first shard (there was nothing else to choose). Any databases created after you have multiple shards could end up with any shard as their primary, essentially it is selected using round robin, but each mongos will have its own idea about where it is in that round robin selection.

Adam Comerford
  • 21,336
  • 4
  • 65
  • 85