1

It seems like every night during our backup process, the Exchange DAG fails over. Is it possible to change a tolerance setting to prevent the DAG from unnecessarily failing over during backup times?

marcwenger
  • 235
  • 1
  • 6
  • 21
  • "seems like" is not enough information to troubleshoot with. What evidence do you have that the DAG is failing over? Event logs? Monitoring events? Edit your question and include more detail. – longneck Jan 31 '13 at 18:52
  • 1. Solarwinds tells us it fails over; 2. When we check the server the standby databases are active; 3. Event logs confirmed switching to passive node (can't find reason for the failover); 4. Usually happens around the same time of day (backups are on schedule). – marcwenger Jan 31 '13 at 19:05
  • OK, so dig through the event logs on your Exchange servers and find out what happened around the time of the failover. – longneck Jan 31 '13 at 19:08
  • Here's one example of a log item before the fail over: Failure Item (Namespace=1, Tag=16, Database=CN=Employees A-M,CN=Databases,CN=Exchange Administrative Group (FYDIBOHF23SPDLT),CN=Administrative Groups,CN=First Organization,CN=Microsoft Exchange,CN=Services,CN=Configuration,DC=xxxxxxxxxxxxxx,DC=local, Instance=Employees A-M) – marcwenger Jan 31 '13 at 19:10
  • 1
    What triggers a failover? Database unavailability, disk I/O, loss of network connectivity? Figure that out and then focus on why the failover is being triggered by that condition or event. – joeqwerty Jan 31 '13 at 20:15
  • In hindsight I was rushed when writing the original question. Three servers in main site: CAS/HUB, MB1, and another MB2 for archives. MB1 is replicating with another MB in different site. All servers are Exc2010 SP2 with rollup 4; Win2008 R2 SP1. All those are running on VMware ESXi 5, each VM has been setup to not run on the same physical host. Backup method inclues Vmware Data Recovery (DR), which includes using snapshots (so the snapshot being a source is a possibility0 – marcwenger Feb 08 '13 at 00:36
  • 20 Mbps connection between the two sites; backup for MB1 occurs in same site as server – marcwenger Feb 08 '13 at 00:52
  • Make that 30 Mbps MPLS connection – marcwenger Feb 08 '13 at 03:32

1 Answers1

0

You have not told us nearly enough to go on, but I've got a guess. If your Exchange servers are virtual, and you're using a snapshot-based backup mechanism (like Veeam, for example), then there is a slight pause while the snapshot is taken (and/or possibly when deleted.) If you're using a VSS-capable backup process, then the backup is consistent, but the slight pause causes the other members of the DAG to think that your VM is down, so it fails over.

If none of that applies to you, then throw us a frickin' bone - what's your exchange environment, what's your backup environment? Otherwise, you're just asking us "How long is a piece of string?"

mfinni
  • 36,144
  • 4
  • 53
  • 86