mongorestore failing on shard cluster v4.2.x | Error: "panic: close of closed channel"

Question

Scenario:

I had a standalone MongoDB Server v3.4.x where I had several DBs and collections respectively. As the plan was to upgrade to lastest 4.2.x, I have created a mongo dump of all DBs.

Created a shard cluster of config server (replica cluster), shard-1 server (replica cluster) & shard-2 server (cluster) [MongoDB v4.2.x]

Issue:

Now when I try to restore the dump, it's partially restoring every time I try to restore DBs. If I try to restore single DB it fails with same error. But whenever I try to restore specific DB & specific collection it always works fine. But the problem is so many collections across many DBs. Cannot do it for all indicvidually & every time it fails at different progress percentage/collection/DBs.

Error:

2020-02-07719:07:03.822+0000 [#####################...] myproduct_new.chats 68.1MB/74.8MB (91.0%)
2020-02-07719:07:03.851+0000 [########## ] myproduct_new.metaCrashes 216MB/502MB (42.9%) 
2020-02-07719:07:03.876+0000 [################## ] myproduct_new.feeds 152MB/196MB (77.4%)

panic: close of closed channel
goroutine 25 [running]: github.com/mongodb/mongo-tools/mongorestore.(*MongoRestore).RestoreCollectionToDB(Oxc0001a0000, 0xc000234540, Oxc, 0xc00023454d, 900, Ox7fa5503e21f0, 0xc00020b890, 0x1f66e326, Ox0, ...)

/data/mci/533e19bcc94a47bf738334351cf58a07/src/src/mongo/gotools/src/github.com/mongodb/mongo-tools/mongorestore/restore. github.com/mongodb/mongo-tools/mongorestore.(*MongoRestore).RestoreIntent(Oxc0001a0000, Oxc00022f9e0, Ox0, Ox0, Ox0, Ox0) 
/data/mci/533e19bcc94a47bf738334351cf58a07/src/src/mongo/gotools/src/github.com/mongodb/mongo-tools/mongorestore/restore. github.com/mongodb/mongo-tools/mongorestore.(*MongoRestore).RestoreIntents.funcl(Oxc0001a0000, 0xc000146420, 0x3) 
/data/mci/533e19bcc94a47bf738334351cf58a07/src/src/mongo/gotools/src/github.com/mongodb/mongo-tools/mongorestore/restore. created by github.com/mongodb/mongo-tools/mongorestore.(*MongoRestore).RestoreIntents 
/data/mci/533e19bcc94a47bf738334351cf58a07/src/src/mongo/gotools/src/github.com/mongodb/mongo-tools/mongorestore/restore. ubuntu@ip-00-xxx-xxx-00:/usr/local/backups/Dev_backup_07-02-2020$  Ox10, Oxc00000f go:503 +0x49b go:311 +Oxbe9 go:126 +Oxlcb go:109
    +0x12d

Question:

I am connecting to mongos and trying to restore. Currently, sharding is not yet enabled for any DB. Can anyone put some light on whats going wrong or how to restore the dump?

Hi Amol, Eventually how did you resolve this problem. It would be kind of you if you post the solution here as well. I believe you must have resolved this issue by now. — khalidmehmoodawan, Nov 24 '21 at 05:54

score 0 · Answer 1 · edited May 25 '20 at 19:38

0

I have got the same problem, then I found out that it is the problem of my mongodb replica set caused this error.

check the rs.status() of your database.

if you got the message

Our replica set config is invalid or we are not a member of it

try this answer

edited May 25 '20 at 19:38

Peyman Mohamadpour

17,954
24
89
100

answered May 25 '20 at 18:16

BIN Y

21
1

1

Hi. Please see how to write a good answer: https://stackoverflow.com/help/how-to-answer - "Links to external resources are encouraged, but please add context around the link so your fellow users will have some idea what it is and why it’s there. Always quote the most relevant part of an important link, in case the target site is unreachable or goes permanently offline." – ArielGro May 25 '20 at 18:54

score 0 · Answer 2 · edited Dec 28 '21 at 10:15

We faced exact same issue for same spec trying to restore from Mongodump. There is no definite reason but could be best to check below factors

Check your Dump size(bjson) vs Allocated Disk free space on Cluster. Dump size could be 2x to 3x of our core Mongo data folder size(Which is compressed on top of BJSON)

Check your Oplog size configured during cluster creation, for first time migration provide 10-15% of free diskspace size as Oplog size, you can change this after migration. This will help secondaries to lag bit longer and catch up on sync faster from WAL. eg: Allocated 3.5 GB for oplog out of Total 75 GB Hardisk size, with 45GB of Total data(compressed). In real world usage scenario(post migration), keep it 1-2 hour write data volumes as oplog size.

Now your total disk space would be Dump folder size + Oplog + 6GB(Default mongo installation + system extras). Note: If you cannot afford to allocate Dump folder size, you have to run the restore in batches(DBs or Collections nsInclude option), giving time for Mongo to compress after importing bjson. This should be done in minutes

After your restore, Mongo will shrink the data size and diskspace will more or less match close to your Standalone data folder size.

If this is not set and your diskspace is under provisioned during Migration, while your migration is on, Mongo will try to increase diskspace and it cannot do when Primary is used, tries to increase in secondary and make current primary to secondary which could potentially cause above error. You can also check the Hardware/Status Vitals to see whether the servers changed state from Primary to Secondary

Also try NOT to enable Server Auto upsizing while creating cluster, I don't have definite rationale, but we don't want any actions to happen in background to upgrade hardware say M30 to M40 because CPU is busy in middle of Migration(it happened to me)

Finally as good practice, try to run Large Databases mainly with Large single Collection > 4 GB(Non-Shard) separately. I had 40+ dbs, with 20% > 15GB in dump BJSON size and having 1 or 2 big collections with >4GB multi-million docs. Once I separated them, it gave breathing space for Mongo to bulk insert and compress them with some elapsed time of few mins. Mongo restore happens at collection level

So instead of taking 40-50 mins to restore it took 90-120 mins after some mock practices and order.

If you have time to plan, try this out. Any other learning please share

Key takeaways - Check your Dump Folder size and large collection size. RATE OF DISK WRITES, OPLOG Lag, RAM, CPU, IO are good KPIs to keep watch on during Dumprestore

mongorestore failing on shard cluster v4.2.x | Error: "panic: close of closed channel"

2 Answers2