Mongodb shard balance not work properly, with a lot of moveChunk error reported

Question

We have a mongoDb cluster with 3 shards, each shard is a replica set contains 3 nodes, the mongoDb version we use is 3.2.6. we have a big database with size about 230G, which contains about 5500 collections. we found that about 2300 collections are not balanced where other 3200 collections are evenly distributed to 3 shards.

below is the result of sh.status (the whole result is too big, i just post part of them):

mongos> sh.status()
--- Sharding Status --- 
  sharding version: {
    "_id" : 1,
    "minCompatibleVersion" : 5,
    "currentVersion" : 6,
    "clusterId" : ObjectId("57557345fa5a196a00b7c77a")
}
  shards:
    {  "_id" : "shard1",  "host" : "shard1/10.25.8.151:27018,10.25.8.159:27018" }
    {  "_id" : "shard2",  "host" : "shard2/10.25.2.6:27018,10.25.8.178:27018" }
    {  "_id" : "shard3",  "host" : "shard3/10.25.2.19:27018,10.47.102.176:27018" }
  active mongoses:
    "3.2.6" : 1
  balancer:
    Currently enabled:  yes
    Currently running:  yes
        Balancer lock taken at Sat Sep 03 2016 09:58:58 GMT+0800 (CST) by iZ23vbzyrjiZ:27017:1467949335:-2109714153:Balancer
    Collections with active migrations: 
        bdtt.normal_20131017 started at Sun Sep 18 2016 17:03:11 GMT+0800 (CST)
    Failed balancer rounds in last 5 attempts:  0
    Migration Results for the last 24 hours: 
        1490 : Failed with error 'aborted', from shard2 to shard3
        1490 : Failed with error 'aborted', from shard2 to shard1
        14 : Failed with error 'data transfer error', from shard2 to shard1
  databases:
    {  "_id" : "bdtt",  "primary" : "shard2",  "partitioned" : true }
      bdtt.normal_20160908
            shard key: { "_id" : "hashed" }
            unique: false
            balancing: true
            chunks:
                shard2  142
            too many chunks to print, use verbose if you want to force print
        bdtt.normal_20160909
            shard key: { "_id" : "hashed" }
            unique: false
            balancing: true
            chunks:
                shard1  36
                shard2  42
                shard3  46
            too many chunks to print, use verbose if you want to force print
        bdtt.normal_20160910
            shard key: { "_id" : "hashed" }
            unique: false
            balancing: true
            chunks:
                shard1  34
                shard2  32
                shard3  32
            too many chunks to print, use verbose if you want to force print
        bdtt.normal_20160911
            shard key: { "_id" : "hashed" }
            unique: false
            balancing: true
            chunks:
                shard1  30
                shard2  32
                shard3  32
            too many chunks to print, use verbose if you want to force print
        bdtt.normal_20160912
            shard key: { "_id" : "hashed" }
            unique: false
            balancing: true
            chunks:
                shard2  126
            too many chunks to print, use verbose if you want to force print
        bdtt.normal_20160913
            shard key: { "_id" : "hashed" }
            unique: false
            balancing: true
            chunks:
                shard2  118
            too many chunks to print, use verbose if you want to force print
    }

Collection "normal_20160913" is not balanced, I post the getShardDistribution() result of this collection below:

mongos> db.normal_20160913.getShardDistribution()

Shard shard2 at shard2/10.25.2.6:27018,10.25.8.178:27018
 data : 4.77GiB docs : 203776 chunks : 118
 estimated data per chunk : 41.43MiB
 estimated docs per chunk : 1726

Totals
 data : 4.77GiB docs : 203776 chunks : 118
 Shard shard2 contains 100% data, 100% docs in cluster, avg obj size on shard : 24KiB

the balancer process is in running status, and the chunksize is default(64M):

mongos> sh.isBalancerRunning()
true
mongos> use config
switched to db config
mongos> db.settings.find()
{ "_id" : "chunksize", "value" : NumberLong(64) }
{ "_id" : "balancer", "stopped" : false }

And I found a lot of moveChunk error from mogos log, which might be the reason why some of the collections not well balanced, here is the latest part of them:

2016-09-19T14:25:25.427+0800 I SHARDING [conn37136926] moveChunk result: { ok: 0.0, errmsg: "Not starting chunk migration because another migration is already in progress", code: 117 }
2016-09-19T14:25:59.620+0800 I SHARDING [conn37136926] moveChunk result: { ok: 0.0, errmsg: "Not starting chunk migration because another migration is already in progress", code: 117 }
2016-09-19T14:25:59.644+0800 I SHARDING [conn37136926] moveChunk result: { ok: 0.0, errmsg: "Not starting chunk migration because another migration is already in progress", code: 117 }
2016-09-19T14:35:02.701+0800 I SHARDING [conn37136926] moveChunk result: { ok: 0.0, errmsg: "Not starting chunk migration because another migration is already in progress", code: 117 }
2016-09-19T14:35:02.728+0800 I SHARDING [conn37136926] moveChunk result: { ok: 0.0, errmsg: "Not starting chunk migration because another migration is already in progress", code: 117 }
2016-09-19T14:42:18.232+0800 I SHARDING [conn37136926] moveChunk result: { ok: 0.0, errmsg: "Not starting chunk migration because another migration is already in progress", code: 117 }
2016-09-19T14:42:18.256+0800 I SHARDING [conn37136926] moveChunk result: { ok: 0.0, errmsg: "Not starting chunk migration because another migration is already in progress", code: 117 }
2016-09-19T14:42:27.101+0800 I SHARDING [conn37136926] moveChunk result: { ok: 0.0, errmsg: "Not starting chunk migration because another migration is already in progress", code: 117 }
2016-09-19T14:42:27.112+0800 I SHARDING [conn37136926] moveChunk result: { ok: 0.0, errmsg: "Not starting chunk migration because another migration is already in progress", code: 117 }
2016-09-19T14:43:41.889+0800 I SHARDING [conn37136926] moveChunk result: { ok: 0.0, errmsg: "Not starting chunk migration because another migration is already in progress", code: 117 }

I tried use moveChunk command manually, it's returns same error:

mongos> sh.moveChunk("bdtt.normal_20160913", {_id:ObjectId("57d6d107edac9244b6048e65")}, "shard3")
{
    "cause" : {
        "ok" : 0,
        "errmsg" : "Not starting chunk migration because another migration is already in progress",
        "code" : 117
    },
    "code" : 117,
    "ok" : 0,
    "errmsg" : "move failed"
}

I am not sure if too many collections created which cause migration overwhelmed? each day about 60-80 new collections will created.

I need help here to answer below questions, any hints will be great:

Why some of the collections not balanced, is it related to the big number of newly created collections?
Is there any command can check the processing migration jobs details? I got a lot of error log which shows some migration jog is running, but I can not find which is running.

You have `data transfer error` in migration, which could point to issues in the networking layer. Also, could you provide more details on each shard? I'm seeing that each shard consists of only two nodes (minimum recommended is three data-bearing nodes). Is this intentional? — kevinadi, Sep 20 '16 at 02:10

score 3 · Answer 1 · edited May 23 '17 at 11:48

Answer my own question: Finally we found the root cause, it's an exactly same issue with this one "MongoDB balancer timeout with delayed replica", caused by abnormal replica set config. When this issue happens, our replica set configuration as below:

shard1:PRIMARY> rs.conf()
{
    "_id" : "shard1",
    "version" : 3,
    "protocolVersion" : NumberLong(1),
    "members" : [
        {
            "_id" : 0,
            "host" : "10.25.8.151:27018",
            "arbiterOnly" : false,
            "buildIndexes" : true,
            "hidden" : false,
            "priority" : 1,
            "tags" : {

            },
            "slaveDelay" : NumberLong(0),
            "votes" : 1
        },
        {
            "_id" : 1,
            "host" : "10.25.8.159:27018",
            "arbiterOnly" : false,
            "buildIndexes" : true,
            "hidden" : false,
            "priority" : 1,
            "tags" : {

            },
            "slaveDelay" : NumberLong(0),
            "votes" : 1
        },
        {
            "_id" : 2,
            "host" : "10.25.2.6:37018",
            "arbiterOnly" : true,
            "buildIndexes" : true,
            "hidden" : false,
            "priority" : 1,
            "tags" : {

            },
            "slaveDelay" : NumberLong(0),
            "votes" : 1
        },
        {
            "_id" : 3,
            "host" : "10.47.114.174:27018",
            "arbiterOnly" : false,
            "buildIndexes" : true,
            "hidden" : true,
            "priority" : 0,
            "tags" : {

            },
            "slaveDelay" : NumberLong(86400),
            "votes" : 1
        }
    ],
    "settings" : {
        "chainingAllowed" : true,
        "heartbeatIntervalMillis" : 2000,
        "heartbeatTimeoutSecs" : 10,
        "electionTimeoutMillis" : 10000,
        "getLastErrorModes" : {

        },
        "getLastErrorDefaults" : {
            "w" : 1,
            "wtimeout" : 0
        },
        "replicaSetId" : ObjectId("5755464f789c6cd79746ad62")
    }
}

There are 4 nodes inside the replica set: one primary, one slave, one arbiter and one 24 hours delayed slave. that makes 3 nodes to be majority, since arbiter have no data present, balancer need to wait the delayed slave to satisfy the write concern(make sure the receiver shard have received the chunk).

There are several ways to solve the problem. We just removed the arbiter, the balancer works fine now.

score 0 · Answer 2 · answered Sep 20 '16 at 01:25

I'm going to speculate but my guess is that your collections are very imbalanced and are currently being balanced by chunk migration (It might take a long time). Hence your manual chunk migration is queued but not executed right away.

Here are a few points that might clarify a bit more:

One chunk at a time: MongoDB chunk migration happens in a queue mechanism and only one chunk at a time are migrated.
Balancer lock: Balancer lock information might give you some more idea of what is being migrated. You should also be able to see log entries is chunk migration in your mongos log files.

One option you have is to do some pre-splitting in your collections. The pre-splitting process essentially configured an empty collection to start balanced and avoid being imbalanced in the first place. Because once they get imbalanced the chunk migration process might not be your friend.

Also, you might want to revisit your shard keys. You are probably doing something wrong with your shard keys that's causing a lot of imbalance.

Plus, your data size doesn't seem too large to me to warrant a sharded configuration. Remember, never do a sharded configuration unless you are forced by your data size/working set size attributes. Because sharding is not free (you are probably already feeling the pain).

Mongodb shard balance not work properly, with a lot of moveChunk error reported

2 Answers2