1

We have a MongoDB replica set installed on Windows servers, with a scheduled backup job. MongoDB is configured to use journaling.

The job runs db.fsyncLock() via script on MongoDB (primary server), then performs a Kaminario file-system snapshot, then runs db.syncUnlock() via script.

The lock/unlock are separately, therefore not using the same connection.

The problem is that the db.fsyncUnlock() doesn't always work, leaving the database locked to writes until it is manually unlocked via shell.

We attempted to perform snapshots without using fsyncLock(), however, our tests showed that:

  1. Restoring a single database-only resulting in the data having "holes" in it (some of the writing thread had non-committed changes followed by committed changes)

  2. Restoring the entire database instance the restore failed due to a lock and missing journal, despite the journal being enabled and present

Whereas, with fsyncLock() the tests passed. The tests were performed with writeConcern=2, as we want to ensure persistency in case of a crash on MongoDB.


My questions are:

a. How to guarantee success of the fsyncUnlock() after performing the snapshot?

b. Is there any way to guarantee a consistent backup without using fsyncLock()?


Cross submitted on DBA.SE.


Edit:
After further investigation, it looks like the problem was with the snapshot hanging, not the unlock. However, that still doesn't explain the fsync requiring the lock.

Danny
  • 121
  • 6

0 Answers0