Life-copies for devel-team in MongoDb

Question

Q: Which is the best architecture for life-copies for testing and development?

Current setup:

We have two amazon/EC2 mongod servers like this:

Machine A:
    A production database (on an amazon/EC2 server) (name it ‘PROD’)
    Other databases (‘OTHER’)

Machine B:
    a pre-production database (name it ‘PRE’)
    a copy for developer 1 own tests (call it ‘DEVEL-1’)
    a copy for developer 2 (DEVEL-2)
    …DEVEL-n

The PRE database is for integration-tests before deploying into production.

The DEVEL-n is for each developer trashing its own data without annoying the other developers.

From time to time we want to “restore” fresh data from PROD into the PRE and DEVEL-n bases.

Currently we pass from PROD to PRE via the .copyDatabase() command. Then we issue .copyDatabase() “n” times to make copies from PRE into DEVEL-n.

The trouble:

A copy takes soooo long (1hour per copy, DBsize over 10GB) and also normally it saturates the mongod so we have to restart the service.

We have found about:

Dump/restore system (saturates as .copyDatabase() does)
Replica sets
Master/Slave (seem deprecated)

Replica-sets seem the winners, but we have serious doubts:

Suppose we want a replica-set to sync live A/PROD into B/PRE (and have A likely as a primary and B likely as secondary):

a) Can I select “a few” databases from A to replicate PROD but leave OTHER alone?

b) Can I have “extra” databases in B (like DEVEL-n) which are not in the master?

c) Can I “stop to replicate” so we can deploy to PRE, test the soft with fresh-data, trash the data with the testing and after tests have been complete “re-link” the replica so changes in PRE are deleted and changes in PROD are transported into PRE adequately?

d) Is there any other better way than replica-sets suitable for this case?

Thanks. Marina and Xavi.

score 1 · Accepted Answer · answered Apr 13 '13 at 13:52

Replica-sets seem the winners, but we have serious doubts:

Suppose we want a replica-set to sync live A/PROD into B/PRE (and have A likely as a primary and B likely as secondary):

a) Can I select “a few” databases from A to replicate PROD but leave OTHER alone?

As at MongoDB 2.4, replication always includes all databases. The design intent is for all nodes to be eventually consistent replicas, so that you can failover to another non-hidden secondary in the same replica set.

b) Can I have “extra” databases in B (like DEVEL-n) which are not in the master?

No, there is only a single primary in a replica set.

c) Can I “stop to replicate” so we can deploy to PRE, test the soft with fresh-data, trash the data with the testing and after tests have been complete “re-link” the replica so changes in PRE are deleted and changes in PROD are transported into PRE adequately?

Since there can only be a single primary, the use case of mixing production and test roles in the same replica set is not possible how you've envisioned.

Best practice would to isolate your production and dev/staging environments so there can be no unexpected interaction.

d) Is there any other better way than replica-sets suitable for this case?

There are some approaches you can take to limit the amount of data needed to be transferred so you are not copying the full database (10Gb) across from production each time. Replica sets are suitable as part of the solution, but you will need to have a separate standalone server or replica set for your PRE environment.

Some suggestions:

Use a replica set and add a hidden secondary in your development environment. You can take backups from this node without affecting your production application, and since the secondary replicates changes as they occur you should be doing a comparatively faster local network copy of the backup.
Implement your own scheme for partial replication based on a tailable cursor of MongoDB's oplog. The local oplog.rs capped collection is the same mechanism used to relay changes to members of a replica set and includes details for inserts, deletes, and updates. You could match on the relevant database namespaces and relay matching changes from your production replica set into your isolated PRE environment.

Either of these approaches would allow you control over when the backup is transferred from PROD to PRE, as well as restarting from a previous point after testing.

A lot of thanks @Stennie I think your answer is very helpfully — Marina Planells, Jul 22 '13 at 10:58

score 0 · Answer 2 · answered Jan 22 '13 at 18:04

0

In our setup we use EBS snapshots to quickly replicate production database on staging environment. Snapshots are run every few hours as part of backup cycle. When starting new DB server in staging, it looks for most recent DB snapshot and use it for EBS drive. Taking snapshot is almost instant, recovery is also very fast. This approach also scales up very well, we actually using it in huge sharded MongoDB installation. The only downside is that you need to rely on AWS services to implement it. That can be undesirable in some cases.

answered Jan 22 '13 at 18:04

Michael Korbakov

2,147
1
18
20

Thank you Michael, but this backup solution doesn't fits either our problem with the different numbers of databases or the option to stop and re-start the propagation for our testings. – Marina Planells Jan 25 '13 at 11:01
It's definitely not satisfying your requirements precisely, but it's possible that your requirements were formed under influence of specific solution that you foresee. It's also totally possible that I didn't grasped background behind your requirements correctly :) I tried to explain how we handle the problem of providing DB environment for integration testing in our system. In this setup there's no need to limit replicated databases because performance is good no matter how many DBs you replicate. Continuous replication also not necessary if you have regular snapshots and recovery is fast. – Michael Korbakov Jan 25 '13 at 11:34

Life-copies for devel-team in MongoDb

2 Answers2