Too many data replication technologies in environment

Question

It is possible to configure remote data replication at many layers of the storage stack. Some examples to explain what I mean by layers:

Physical Volume Layer (TruCopy, SRDF/MirrorView, SnapMirror)
Virtual Server Layer (vReplicator, Veeam)
Logical Volume Layer (VVR, AVS)
Filesystem Layer (Rsync, ZFS Send/Recv)
Application Layer (Data Guard, MySQL/PostgreSQL Replication, AD Replication)

Replication at each layer has a unique value proposition but it would be crazy to use them all. My particular problem is the proliferation of different solutions; I'd like to standardize on one or two solutions in order to keep down license costs, complexity, and expertise requirements.

My question is: ignoring the individual product examples, which layer or combination of layers of replication provide the most value in a typical corporate environment, and which are becoming more valuable?

Possible metrics include ease of administration, availability of expertise, protection from data corruption, recovery time/complexity in a DR scenario, reduction in planned outage windows, application performance, application flexibility, and anything else that might be valuable.

This is a question I encounter often as a system administrator so I would appreciate any advice.

Massimo · Accepted Answer · 2011-05-30T15:02:08.753

Which solution is better suited to a given scenario strongly depends on the applications and data involved, there is no "global" approach that can work everytime, everywhere.

There really are too many different scenarios involved. Some examples:

Active Directory domain controllers are designed to work as a multi-master, replicated, distributed database; and you also need to have local DCs in a WAN setup, because domain-joined Windows machines need to have a DC available in order to work properly. At the same time, with AD you wouldn't benefit at all from having a lower-level replication, at the virtual server layer or at the physical storage layer, as you can't have a DC "mount" a database copied from another one, and you can't either start a cloned DC and expect it to work (don't even try to do that, USN rollback is nasty).
For a file server, the exact opposite is true: while there are application-level replication technologies (DFS in Windows), for big amounts of data a storage-level replication is probably the best approach.
Now, let's say you are using Exchange; recent Exchange systems (2007-2010) are designed with transactional database replication in mind, and this is the only officially supported approach; you actually can copy an Exchange database and mount it on another Exchange server (that's called "database portability"), so storage-level replication could be made to work... but it's a lot more painful, and moving users to such a server requires heavy reconfiguration. Also, while in the two aforementioned scenarios there actually is a point in having local copies of the data in multiple places (AD is needed everywhere, and users in different offices could need to access the same file server data), for Exchange this only makes sense in a disaster-recovery scenario, because in normal operation each user only accesses his/her own mailbox, which can only be active on a single server at a time.
For various database systems, again, this could work both at the application level or at any underlying level; but this strongly depends on the application: if you need to modify data in multiple places (as opposed to only read them), there's no way storage replication can merge conflicting copies, and you will need to let your DBMS handle that.

Each technology has its own benefits and use cases; there really is no "one size fits all" approach.

Thanks for your comment. Despite your pessimism, to me your answer sounds quite positive: that in most cases replication can be handled best at the application layer. File servers need to be handled differently, but the trend for file servers is towards NAS appliances anyway, i.e. towards simplified management with well-documented DR features, so this is not such an issue. I'm keen to hear any more exceptions. — Tom Shaw, May 31 '11 at 01:17
@Tom, I didn't mean by any means to appear pessimistic :-) I totally agree on handling data replication at the application level *for most cases*... but this makes all those new and great lower-level replication technologies seem quite useless, hence what could be perceived as "pessimism". Also, each application handles its replication howewer it sees fit, so there is absolutely no standardization here. — Massimo, May 31 '11 at 04:13
Thanks, great answer. I don't mind so much that each application is different, because managing a particular app is a self-contained skillset, whereas coordinating data up and down the stack is much harder! Perhaps VM replication has a place as long as it is seen as a pure DR technology which is not a substitute for application-specific replication. — Tom Shaw, May 31 '11 at 05:41
@Tom: absolutely agreed, VM and SAN replication are great for DR (if you have enough bandwidth to the DR site, of course). — Massimo, May 31 '11 at 06:27

score 2 · Answer 2 · answered May 30 '11 at 14:20

I've so far considered two options. The over-arching idea of both is to assign responsibility for replication to one team only: either the system administrators or the application administrators. With only one team involved, you decrease the complexity of the system and the risk.

Standardize on Virtual Server Layer Replication

By configuring replication at the virtual server layer only, you can treat the operating system and any applications running in it as a black box. The system administrators are in control.

Advantages:

The DR procedure for all applications is identical: stop the replication, start the DR virtual servers, and ask the different application teams to confirm that all is well.
It is easy to confirm that you have complete coverage of all data.
It is easy to plan server/storage maintenance because the involvement of the application teams is minimal.

Disadvantages:

The DR servers are idle the majority of the time: a possible waste of server resources.
It can be difficult to test DR without a full failover given that IP addresses match Prod.
Replication is immediate, so if there is data corruption in Prod it is propagated immediately to DR.
It may be difficult to use this method to clone to Dev/Test.
Can't be used with systems that don't run well in a VM.

Can be used in combination with physical volume layer technologies to enhance performance of replication (while still being managed from the VM software).

Standardize on Application Layer Replication

By configuring replication at the application layer only, you can treat the server, storage and the operating system as a black box. The application administrators are in control.

Advantages:

Many applications can run in a multi-master or master-slave mode to take advantage of all server resources.
Many applications can transmit changes in a concise format, saving on bandwidth and improving performance.
Some applications can delay the application of changes to prevent corruption from being propagated immediately at DR.
It is relatively easy to sever replication and re-configure servers as Dev/Test.

Disadvantages:

Each application team must configure replication in their own unique way.
Not all applications support this kind of replication, so there needs to be a fall-back option.

Can be used in combination with filesystem layer technologies to support applications without the ability to replicate.

Your implicit assumption here is that system/storage administrators and application administrators are two independent units working incoherently. While this might be the case, I'd expect an organisational change to reduce the friction between these two units to yield a better result than creating an overly fixed and inflexible policy about replication. In fact, I'd think that using *both* - replication at storage or virtualization layer along with application layer replication - would give you the greatest flexibility and the richest featureset - of course at the cost of additional complexity — the-wabbit, May 30 '11 at 21:22
@syneticon-dj: Yes there is a cost to flexibility and features. I've seen convoluted DR procedures which require action by the storage guy, network guy, sysadmin, dba and appserver guy. DR testing takes hours and never quite works as planned. Regarding oranisational change: these complex systems are often designed by one superstar who knows the whole stack. Then that person leaves and it is simply impossible to find someone with the breadth of skill to understand it all. If no-one knows how to use and fix it, it's worse than useless. — Tom Shaw, May 30 '11 at 23:32
Ah, now as you narrow down the problem set to disaster recovery, it gets much more practical indeed. as desaster recovery has a clear goal of a minimized time-to-recovery, I would agree with you that using just one technology and simplify the processes is the way to go. — the-wabbit, May 31 '11 at 07:05
@syneticon-dj: Cheers. For the record, here's what I've learned. Application layer replication is almost essential because it is the only clean way of achieving many functionality goals. It may also be sufficient by itself for DR purposes. A lower-layer technology like VM replication or volume replication of entire servers may be added to make a full DR failover quicker and easier and to ensure complete data coverage. It gets painful when you mix strategies and try to combine lower-layer technology with application migration. — Tom Shaw, May 31 '11 at 08:32

Too many data replication technologies in environment

2 Answers2

Standardize on Virtual Server Layer Replication

Standardize on Application Layer Replication