Using mongorestore for oplogreplay with --numInsertionWorkersPerCollection / concurrently using mongorestore for oplog replay

Question

If I use mongorestore with -- numInsertionWorkersPerCollection > 1 for oplogreplay, it does not bring any performance improvement.I have a 8 core machine with 64 GB RAM and my complete oplog size is around 1 GB (around 1 million request on same collection). So I don't think hardware is the limitation here. Kindly let me know what could be the reason behind it.

Basically, I was comparing the mongorestore with sync (which is used to update oplogs on secondary). In case of sync, we have default 16 workers which can apply the oplogs concurrently and I was hoping I can do same with mongorestore too.

score 1 · Accepted Answer · answered May 31 '17 at 09:27

1

numInsertionWorkersPerCollection works only when inserting data, not replaying opLog.

Looking from mongorestore source code, oplogReplay is single thread, so parallel replay don't work.

answered May 31 '17 at 09:27

JJussi

1,540
12
12

thanks for the info ! @JJussi :) . But then how would you explain that during sync the oplog replay occurs very fast as compared to the mongorestore oplog replay. – chetan sharma Jun 02 '17 at 16:02
To support my claim, I have performed following experiment. I disconnected the secondary from replica set and then performed around 1 M operations on primary. Now, I run the secondary without replica set (i.e. without --replSet setName) and apply oplogs on it using mongorestore oplog replay. It took around 4 minutes Again, I performed same experiment but now I just connect the secondary again with primary(i.e.with --replSet setName) and it took around 1 minute. WHat is the reason behind such a huge difference ? – chetan sharma Jun 02 '17 at 16:13
Different code. Different programmer. I haven't check mongod code about opLog replay, but there it can be parallel nowadays. Before wiredTiger engine time, mongod was quite much single thread too. – JJussi Jun 02 '17 at 16:14
Difference probably came from that, that data was already in memory and replayed from there. – JJussi Jun 02 '17 at 16:16
Even I thought that its because of in-memory so I kept the dump file in memory using "mount -t tmpfs -o size=5000M none 'filename' " and execute mongorestore on it. It just bring a very little change. Also, I tried doing sync with different number of worker threads for oplog replay (i.e. using --setParameter replWriterThreadCount=x) to see if with one thread is it equivalent to mongorestore but still no luck. It is way better than mongorestore. :( Would you please share your insight about it or just let me know suitable resource where I can read and understand more about it.:) Thanks a lot! :) – chetan sharma Jun 02 '17 at 18:13

Using mongorestore for oplogreplay with --numInsertionWorkersPerCollection / concurrently using mongorestore for oplog replay

1 Answers1