10

I'm a bit confused as to how this works.

When sharding MySQL, we had some tables, usually small ones with reference data, whole in each shard. This was to enable joins.

If we have small collections in MongoDB, that we don't shard in a sharded setup, what happens to them? Do they get sent to each shard, or just stay in the first shard?

This strikes me as a possible potential bottleneck, if all processes in a heavily sharded system with many application servers were hitting on one server.

Eddie C.
  • 918
  • 10
  • 16
CargoMeister
  • 4,199
  • 6
  • 27
  • 44
  • You have to manually tell (command) which of the collections are sharded. For smaller collections don't shard it, so it will remain in only 1 shard. Any query that is done (over that collection), will only be sent to that shard holding the collection. Refer: `http://docs.mongodb.org/manual/tutorial/deploy-shard-cluster/#sharding-setup-shard-collection` – Abhishek Kumar Aug 28 '13 at 05:53
  • Collections that you do not shard will reside on the first shard, yes it could be a bottle neck, there has been talk about distributing those collections as well however, currently MongoDB does not – Sammaye Aug 28 '13 at 07:10

1 Answers1

2

In MongoDB with the autosharding feature, a sharded collection will be distributed somehow evenly along all the shards you have.

With those collections which you not likely to shard (which are not sharded) you can specify a primary shard which will they reside on. This primary shard is a given one for a specific database, so it is on per database level. Can be moved and can be different for different databases.

There is the notion of shard tagging which with you can influence for sharded collections where to be placed. Basicly you can constraint a collection or a part of a collection to be stored on a specific set of shards. (Reference)

Eddie C.
  • 918
  • 10
  • 16
attish
  • 3,090
  • 16
  • 21
  • I am not sure if tag aware sharding works for non-sharded collections, never tested it, however, judging by the fact that the chunks of a non-sharded collection will never be balanced I would say no – Sammaye Aug 28 '13 at 08:13
  • That is why i wrote that "you can influence for sharded collections where to be placed", however one can say you likely to place a sharded collection entirely on one shard which will be nearly (probably exactly) the behaviour of a non-sharded collection. I am quite sure you cannot tag unsharded collection. – attish Aug 28 '13 at 09:15
  • Thanks all. That's what I was concerned about. To me, it's a bit of a potential threat. Of course you don't have joins to worry about in Mongo, which is one of the primary reasons for copying the smaller tables around to the various shard. The other way to deal with them is to set up a different, replicated environment. I'm guessing that in a Mongo environment, you tend to not do statuses and such via a unique key, you are using the actual status you want to appear in a report, etc. Which is only a problem if you want to change the status text. – CargoMeister Aug 29 '13 at 15:44
  • One needs to be careful with indexing and sharding. Once a collection is sharded, unique indexes no longer work as expected, so if you just auto-shard arbitrary collections, without considering unique indexes, some stuff may break. – nilskp Aug 15 '14 at 21:20