5

My MongoDB sharded cluster ingestion performances don't scale up when adding a new shard.

I have a small cluster setup with 1 mongos + 1 config replica set (3 nodes) + N shards replica sets (3 nodes each).

Mongos is on a dedicated Kubernetes node, and each mongo process hosting shards has its dedicated k8s node, while the config mong processes run a bit here and there where they happens to be deployed.

The cluster is used mainly for GridFS file hosting, with a typical file being around 100Mb.

I am doing stress tests with 1, 2 and 3 shards to see if it scales properly, and it doesn't.

If I start a brand new cluster with 2 shards and run my test it ingest files at (approx) twice the speed I had with 1 shard, but if I start the cluster with 1 shard, then perform the test, then add 1 more shard (total 2 shards), then perform the test again, the speed of ingestion is approx the same as before with 1 shard.

Looking at where chunks go, when I start the cluster immediately with 2 shards the load is evenly balanced between shards. If I start with 1 shard and add a second after a some insertions, then the chunks tend to go all on the old shard and the balancer must bring them later to the second shard.

Quick facts:

  • chunksize 1024 MB

  • sharding key is GridFS file_id, hashed

Federico Bonelli
  • 761
  • 1
  • 7
  • 23

1 Answers1

2

This is due to how hashed sharding and balancing works.

In an empty collection (from Shard an Empty Collection):

The sharding operation creates empty chunks to cover the entire range of the shard key values and performs an initial chunk distribution. By default, the operation creates 2 chunks per shard and migrates across the cluster.

So if you execute sh.shardCollection() on a cluster with x numbers of shards, it will create 2 chunks per shard and distribute them across the shards, totalling 2x chunks across the cluster. Since the collection is empty, moving the chunks around take little time. Your ingestion will now be distributed evenly across the shards (assuming other things e.g. good cardinality of the hashed field).

Now if you add a new shard after the chunks were created, that shard starts empty and the balancer will start to send chunks to it using the Migration Thresholds. In a populated collection, this process may take a while to finish.

If while the balancer is still moving chunks around (which may not be empty now) you do another ingestion, the cluster is now doing two different jobs at the same time: 1) ingestion, and 2) balancing.

When you're doing this with 1 shard and add another shard, it's likely that the chunks you're ingesting into are still located in shard 1 and haven't moved to the new shard yet, so most data will go into that shard.

Thus you should wait until the cluster is balanced after adding the new shard before doing another ingestion. After it's balanced, the ingestion load should be more evenly distributed.

Note: since your shard key is file_id, I'm assuming that each file is approximately the same size (~100 MB). If some files are much larger than others, some chunks will be busier than others as well.

kevinadi
  • 13,365
  • 3
  • 33
  • 49
  • this explanation confirms everything I'm observing and the documentation, I'll try increasing the number of initial chunks so that (potentially) every new insertion will get to a new chunk, hence getting a fair probability to go to either the new or the old shards – Federico Bonelli Aug 26 '19 at 13:10
  • it doesn't work, for some reasons moving empty chunks is fairly slow, so having a lot of them slows things instead of speeding them up – Federico Bonelli Aug 26 '19 at 15:51
  • Chunk move require two things: 1) metadata change on the config server, 2) actual data moving between shards. If you have a lot of chunks to move at once, it will be bottlenecked on the config servers, even if they're empty. I have seen this before if I have chunks to move in the hundreds. – kevinadi Aug 26 '19 at 22:16
  • I've added a different question on DBA for my case: https://dba.stackexchange.com/q/246339/36427 – Federico Bonelli Aug 27 '19 at 20:34