2
  • I have a one member mongod instance (Server1) with a very large DB with 2.5 million documents (Each document is very big) and 4 indices.
  • Then I added another machine (Server2)to this replica set. Mongod on Server2 takes about 5 hours to fetch all the documents in this big database.
  • After all the documents are fetched by Server2, It starts making secondary indices. It takes around 3 hours for indices to get finished.

    • Immediately after completing building the secondary indices, It tries to connect to primary and finds that socket is expired and timed out.
    • At having received a timeout error it (Server2) simply drops all databases and starts the initial sync again.

    • The snippet from logs is below:

2016-05-25T11:50:36.053+0000 I -        [rsSync]   Index Build: 2211700/2215091 99%
2016-05-25T11:50:39.221+0000 I -        [rsSync]   Index Build: 2212000/2215091 99%
2016-05-25T11:50:43.300+0000 I -        [rsSync]   Index Build: 2212300/2215091 99%
2016-05-25T11:50:46.103+0000 I -        [rsSync]   Index Build: 2212500/2215091 99%
2016-05-25T11:50:49.068+0000 I -        [rsSync]   Index Build: 2212800/2215091 99%
2016-05-25T11:50:52.218+0000 I -        [rsSync]   Index Build: 2213600/2215091 99%
2016-05-25T11:50:55.439+0000 I -        [rsSync]   Index Build: 2214500/2215091 99%
2016-05-25T11:50:58.738+0000 I -        [rsSync]   Index Build: 2214700/2215091 99%
2016-05-25T11:51:13.223+0000 I -        [rsSync]   Index: (2/3) BTree Bottom Up Progress: 536600/2215091 24%
2016-05-25T11:51:23.285+0000 I -        [rsSync]   Index: (2/3) BTree Bottom Up Progress: 1984500/2215091 89%
2016-05-25T11:51:24.317+0000 I INDEX    [rsSync]   done building bottom layer, going to commit
2016-05-25T11:51:24.508+0000 I INDEX    [rsSync] build index done.  scanned 2215091 total records. 10491 secs
2016-05-25T11:51:25.082+0000 I NETWORK  [rsSync] Socket say send() errno:110 Connection timed out xx.xx.xx.xx:27017
2016-05-25T11:51:25.106+0000 E REPL     [rsSync] 9001 socket exception [SEND_ERROR] server [xx.xx.xx.xx:27017] 
2016-05-25T11:51:25.106+0000 E REPL     [rsSync] initial sync attempt failed, 9 attempts remaining
2016-05-25T11:51:30.106+0000 I REPL     [rsSync] initial sync pending
2016-05-25T11:51:30.433+0000 I REPL     [ReplicationExecutor] syncing from: xx.xx.xx.xx:27017
2016-05-25T11:51:30.563+0000 I REPL     [rsSync] initial sync drop all databases
2016-05-25T11:51:30.564+0000 I STORAGE  [rsSync] dropAllDatabasesExceptLocal 42
2016-05-25T11:51:31.925+0000 I JOURNAL  [rsSync] journalCleanup...
2016-05-25T11:51:31.925+0000 I JOURNAL  [rsSync] removeJournalFiles
2016-05-25T11:51:32.331+0000 I JOURNAL  [rsSync] journalCleanup...
2016-05-25T11:51:32.332+0000 I JOURNAL  [rsSync] removeJournalFiles
2016-05-25T11:51:32.489+0000 I JOURNAL  [rsSync] journalCleanup...
2016-05-25T11:51:32.489+0000 I JOURNAL  [rsSync] removeJournalFiles
  • It has been very very frustrating trying to sync this replica set. It keeps doing initial sync over and over again. Any help is highly appreciated.
VaidAbhishek
  • 5,895
  • 7
  • 43
  • 59
  • what is the server config ? What is the size of data? How many members are in replica set? – Atish May 25 '16 at 12:39
  • server config? It's pretty standard with default values. Size of the DB alone which is causing problems is around 200GB. There are 3 members in replica set. A primary and remaining two fail to graduate from STARTUP2 to Secondary. – VaidAbhishek May 25 '16 at 18:20
  • it would be helpful to see the log from the server it is pulling from, for the same time period. – user3973 Jul 09 '16 at 22:39

1 Answers1

0

Perhaps you can use this as a workaround until the problem is diagnosed:

https://docs.mongodb.com/manual/tutorial/resync-replica-set-member/#replica-set-resync-by-copying

user3973
  • 351
  • 1
  • 1