1

Question: Can the Azure portal report start/end times for any 'live migration' of VMs?

My company is migrating VMs from our local data centre (VMWare ESX Server) to Azure cloud (Microsoft Hypervisor). Azure has a feature called 'live migration'. This automatically migrates VMs between Azure server instances. When live migration occurs, VMs are paused a few seconds.

I suspect ‘live migration’ events may be causing intermittent performance slowdowns we're seeing 3-4 times a month. Our servers in Azure receive tens of requests a second. Their internal metrics (as measured by themselves) seem fine. But other connected servers see intermittent performance slowdowns.

Additional info: Microsoft Blog on Live Migration: https://azure.microsoft.com/en-au/blog/improving-azure-virtual-machine-resiliency-with-predictive-ml-and-live-migration/

We've seen a similar problem before. 'Live migration' seems equivalent to a VMWare feature called 'vMotion/DRS' that automatically migrate VMs between physical servers for load balancing. For instance, if a physical server gets heavily loaded, vMotion/DRS automatically moves VMs to another blade in our data center. Several years ago, we observed vMotion/DRS was causing problems with clustering software. We had to disable it for some VMs.

Happyblue
  • 75
  • 1
  • 8
  • AFAIK these events are not reported. But the alternative is the VM gets killed and restarted. – Michael Hampton Jan 18 '19 at 02:27
  • Thanks. We've raised a call on Microsoft to obtain this info. – Happyblue Jan 18 '19 at 06:14
  • @MichaelHampton, the alternative needn't be a VM-kill. If hardware maintenance is coordinated with customers, VMs can be brought down cleanly. Or maybe live migration isn't the problem -- if MS can provide their event times, we can correlate with our outage times – Happyblue Jan 18 '19 at 06:26
  • MS will not co-ordinate maintenance events with you, this is the point of a cloud service. AWS and GCP are the same. – Sam Cogan Jan 29 '19 at 18:15
  • Thanks @SamCogan. I suspect live migration isn't just for hardware maintenance - Azure uses it to balance load too (i.e. can happen every few days). What I have is very intermittent - I'm waiting on past live migration event times to compare to our outage times. – Happyblue Feb 05 '19 at 00:17

0 Answers0