I have a MongoDb production cluster running 2.6.5 that I recently migrated from two to three shards. I had been running as two shards for about a year. Each shard is a 3-server replica set and I have one collection sharded.
The sharded collection is about 240G, and with the new shard I now have evenly distributed chunks of 2922 on each shard. My production environment appears to be performing just fine. There is no problem accessing data.
[Note: 1461 should be the number of chunks moved from rs0 and shard1 to make 2922 on shard2.]
My intent was to shard three more collections, so I started with one and expected it to spread across the shards. But no - I ended up with this repeating error:
2014-10-29T20:26:35.374+0000 [Balancer] moveChunk result: { cause: { ok: 0.0, errmsg: "can't accept new chunks because there are still 1461 deletes from previous migration" },
ok: 0.0, errmsg: "moveChunk failed to engage TO-shard in the data transfer: can't accept new chunks because there are still 1461 deletes from previous migration" }
2014-10-29T20:26:35.375+0000 [Balancer] balancer move failed: { cause: { ok: 0.0, errmsg: "can't accept new chunks because there are still 1461 deletes from previous migration" },
ok: 0.0, errmsg: "moveChunk failed to engage TO-shard in the data transfer: can't accept new chunks because there are still 1461 deletes from previous migration" } from: rs0 to: shard1 chunk: min: { account_id: MinKey } max: { account_id: -9218254227106808901 }
With a little research I figured I should just give it some time, since obviously it needs to clean things up after the move. I ran sh.disableBalancing("collection-name") to stop the errors from attempting to shard the new collection. sh.getBalancerState shows true, as does sh.isBalancerRunning. However, I gave it 24 hours and the error message is the same. I would think it would have cleaned up/deleted at least 1 of the 1461 it needs to delete.
- Is this common behavior now in 2.6 world? Am I going to have to manhandle all my sharded collections every time I grow the environment by another shard?
- Any idea how to get this cleanup going? or should I just step down the primary on shard1, which seems to be the issue?
- If I do step down the primary, will I still have files to 'delete/cleanup' on the secondary anyway? Or will this take care of things so I can start sharding some new collections?
Thanks in advance for any insights.