-1

We were hosting a Website and a CMS at an external supplier. They told us that they host everything on Microsoft Azure. Yesterday I saw that my website was down and contacted them. Later that day they told me that our server had a "virtual harddrive" failure. And the Data newest data which was not backed up is lost. I know that Ubuntu 14.04 OS was running on that machine. I also know that every storage on Azure is 3 times redundant if not more. Besides the temporary storage. Now I assume that they either used the temp storage for any activity or the failure was not a hard drive failure. I googled for any incidents like that in the past, but couldn't find any. Also all my own Azure machines are running happily for ages.

What could happen that a virtual disk failure will result in losing all my data? This should not be an opinion based question, meaning "do you trust my supplier". I would like to know the possible reasons for a harddrive failure on a 3 times redundant storage. Also assuming that no admin accessed Azure and stopped and killed the machine manually.

RayofCommand
  • 1,451
  • 8
  • 26
  • 36
  • 2
    "I also know that every storage on Azure is 3 times redundant if not more." If it's anything like AWS, that's just for their blob storage (in AWS, S3), not virtual disks attached to an instance. You should always have off-site backups, if for no other reason than someone deleting all your files deliberately would be replicated across all nodes. – ceejayoz Apr 29 '16 at 19:10
  • 1
    Corruption comes to mind immediately. Multiple copies of corrupt data still leaves you with corrupt data you can't recover. Happened to my employer a while back. One of their servers had a snapshot that was made in an old snapshot format, and the server was migrated through an ESXi upgrade or two without having the snapshot consolidated. A couple years later, for no ostensible reason, enough was enough, the machine crashed, and we had a couple TB worth of corrupted, unrecoverable delta disks, perfectly copied to four separate disk locations. – HopelessN00b Apr 29 '16 at 19:17
  • 2
    nono also the c: drive is tripple redundant. https://azure.microsoft.com/en-us/documentation/articles/virtual-machines-linux-about-disks-vhds/ – RayofCommand Apr 29 '16 at 19:55
  • 2
    @ceejayoz all Azure OS disk storage is at least triple redundant, disks are stored in blob storage. – Sam Cogan Apr 30 '16 at 10:27
  • Correct, only corruption might kill it, I guess. – RayofCommand Apr 30 '16 at 10:41
  • 2
    As others have said, Azure has very good fault tolerance and very high SLAs; this is unlikely to be an actual Azure issue. It's much more likely they did something wrong and are now just trying to blame it on Azure. – Massimo Apr 30 '16 at 10:51
  • I know on some VM's type the storage is not persistant, like RDP's server, as they ask for a SQL server or such to store the data somewhere else. I seen user loosing like a db3 database that way, as they stored it on a single RDP server on Azure, but the storage was lost when the VM restarted, as on Azure, for RDP, the system will load up another instance if needed to support the load, thus having the data local kill that idea of that on Azure. – yagmoth555 May 16 '16 at 16:07

1 Answers1

4

As you correctly state, all Azure disk storage (except for temporary disks) are replicated 3 times in the same DC, and if you use geo replication than another 3 times in another DC, so realistically disk failure is an unlikely cause. There are a few reasons I could think of that might explain this:

  • As HopelesNoob mention, it could be data corruption, if you get some corrupted data then it is going to be replicated across your storage replicas quickly. The only thing to do then is restore from backup
  • Data was stored on the temporary drive. All Azure VM's get a second temporary drive, this is attached to storage on the local host and is not redundant and will get wiped if the machine moves to a new host. It should only be used for temp data.
  • User error, I suspect this is the more likely, that someone did something or deleted something they shouldn't, and no amount of replication will help with that. Again, backups are your friend here. You would hope if this is the case they would own up to it.
Sam Cogan
  • 38,736
  • 6
  • 78
  • 114