In MongoDB ReplicaSet, does a priority 0 node (hidden, delayed, etc) have to have the same Oplog size as the nodes that can become primary?

Question

I understand the Oplog collection in MongoDB replica set. A bigger Oplog means a bigger recovery window when a data bearing node goes down.

However for nodes that can never be primary, is the Oplog size important?

The rule to recover a node that went down was that its newest operation in its Oplog will also be currently found in the primary node's Oplog. Since the primary node Oplogs is big, and the newest operation of the node that we try to recover is the last operation that was recorded in its Oplog(and that can be tracked even with a very small Oplog), does it really matter the size of the "priority 0" nodes Oplog?

I understand that a secondary, priority 1 node, could copy new contents in its Oplog from any other data-bearing node by default (even it that node has priority 0), however is that the only reason to keep a big Oplog on a priority 0 node? If so, just disabling that option and forcing data-bearing nodes to replicate only from the primary node would allow me to have a small Oplog in a priority 0 node.

A secondary node may get the oplog from another secondary, not necessarily from the primary. MongoDB replica set members with 1 vote cannot sync from members with 0 votes. (for version 3.2 or newer). Bear in mind, hidden and delayed member have votes is greater than 0 - you question is not precise! See [replSetSyncFrom](https://docs.mongodb.com/manual/reference/command/replSetSyncFrom/#mongodb-dbcommand-dbcmd.replSetSyncFrom) — Wernfried Domscheit, Aug 27 '21 at 10:12

score 2 · Answer 1 · answered Aug 27 '21 at 10:22

2

The size of the Oplog does not have to be the same for all members, though it is recommended. You can configure the specific secondary to have a smaller oplig size, and then test the application with this database configuration.

answered Aug 27 '21 at 10:22

Dror

5,107
3
27
45

What is the outcome when a secondary node will sync from another secondary(or hidden) that has small oplog? Most likely the newest operation in the secondary trying to sync will not be found in the (small) oplog of the hidden. In that scenario, does the secondary attempt to sync from another node or does it directly become stale? – FCR Aug 27 '21 at 11:06
A secondary does not "become" stale. It is always stale by definition (unless your deployment is read-only). – D. SM Aug 27 '21 at 18:56
Stale meaning it can't sync anymore. – FCR Aug 27 '21 at 21:54
@FCR Secondaries may automatically change their sync from source as needed based on changes in the ping time and state of other members' replication. See Replication Sync Source Selection for more information on sync source selection criteria. For more on that see page : https://docs.mongodb.com/manual/core/replica-set-sync/ – Dror Aug 28 '21 at 11:44
@Dror thanks. I can accept your answer if you also help me out understand the last question I had. "I understand that a secondary, priority 1 node, could copy new contents in its Oplog from any other data-bearing node by default (even it that node has priority 0), however is that the only reason to keep a big Oplog on a priority 0 node?" – FCR Aug 28 '21 at 12:48
@FCR a. you keep a large oplog purely for high availability. Without a large oplog, when secondary node is down - it will not be able to resync all the changes that were done since . in that case you will need to manually copy over the data from some other node (and the mopre data the source node has, - the longer this process will take). manually syncing nodes is an unnecessary headache for administrators.... – Dror Aug 28 '21 at 17:33
@fcr b. during manual sync - they will also need to stop the source node. and as we already know, if two node of a three node replicaset -are down, the cluster wont have a majority of votes to elect a primary, hence - the cluster will become read only during that manual sync. You can imagine the implications of a production database no longer accept writes – Dror Aug 28 '21 at 17:34
@Dror, b is not true, Filesystem snapshots do not require shutting down the node. https://docs.mongodb.com/manual/tutorial/backup-with-filesystem-snapshots/. Even with replicaset chaining, the hidden node is used as sync source only on a second pass. And also "Sync sources are evaluated each time a sync source is updated and each time a node fetches a batch of oplog entries." https://docs.mongodb.com/manual/reference/parameters/#mongodb-parameter-param.maxNumSyncSourceChangesPerHour. So when a new visible node (with enogh oplog size) will come back to life, all nodes will revaluate it. – FCR Aug 28 '21 at 17:47
And as for a. This is true but not actually related to the question I asked. For the record. Nodes that go down will need to get a sync source node. Sync sources nodes are picked in two passes. In a nutshell, if the recovering node finds a visible node with enough oplog, it will pick it. If by any chance the recovering node cannot find a visible node with enough oplog, it might take a hidden node as source. However, sync sources are continously being picked by secondary nodes, at the moment a visible node (enough oplog) is reachable by recovering node, it will be used as new source for it. – FCR Aug 28 '21 at 18:15
@FCR I didnt refer to snapshot because not all are familiar with this option, and not everyone is taking snapshots on a regular basis. also if you take a snapshot, it might include of the mongod,cfg of the source-node (unless you separated the cfg and the data folders) , meaning you may need to update the cfg file after deploying the snapshot. and in any case- taking snapshots and restoring them is administration burden and takes time. that we dont necessarily want to spend. – Dror Aug 28 '21 at 20:19
1

@Dror you a right, however by default, config files (in linux) will go to /etc/mongod.conf. When you mount a volume for mongodb data, you would usually do it in a separate volume and mountpoint (/mnt/mongodb) for example. A FS snapshot of the volume mounted in /mnt/mongodb and mounted in some other node for its recovery, it will not interfer with its /etc/mongod.conf file. Additionally, FS snapshots are the recommended way to backup data by MongoDB (as they discarded mongodump and mongonrestore in sharded) https://docs.mongodb.com/manual/core/backups/#back-up-by-copying-underlying-data-files – FCR Aug 29 '21 at 08:45

In MongoDB ReplicaSet, does a priority 0 node (hidden, delayed, etc) have to have the same Oplog size as the nodes that can become primary?

1 Answers1