1

We have a 4 host ESXi 6.5 cluster with DRS fully automated. When checking the history, we see a specific (big) VM (6 CPUs, 64 GB mem) having roughly 10 vMotions by DRS per day. Someone from the team claims we should make DRS less aggressive and exclude this big machine from DRS.

But I'm wondering, what's the point of that? Can't we just let DRS do its job since a vMotion should have no impact on guest and cluster performance? I'd like to have some arguments to tell him not to make things too complex by applying exclusions and so on.

Stuggi
  • 3,506
  • 4
  • 19
  • 36
EsTeGe
  • 271
  • 1
  • 5
  • 14

3 Answers3

0

vMotions do have a small impact on the cluster, it eats up a bit of hypervisor time and uses network bandwidth obviously too - but generally speaking leaving it on makes sense, but if you want to lower the aggression then that's fine too. I'm wondering that given the VM's resource requirements maybe it moving around a decent amount means you need more CPU and/or memory? Also why have you not moved to 6.7 yet?

Chopper3
  • 101,299
  • 9
  • 108
  • 239
0

You're moving tens of GB of RAM via network from 1 host to another so you DO have an impact. I would strongly recommend lowering the aggression of the DRS. You gain nothing by moving VMs 10 times a day; DRS will help you get to an overall balanced load in the cluster and then somewhat maintain it when you create new VMs (you will get a recommended target host). It will also re-balance the cluster when there are larger discrepancies between the hosts.

Fatman
  • 1
  • So you suggest to only lower the Migration Threshold (it's currently at 3), but not put the VM on some exclusion list for DRS? – EsTeGe Jul 23 '20 at 11:15
  • It is difficult for me to say without knowing more about your environment. I usually target to disable DRS on database, high IO and latency-sensitive servers, but it is perfectly fine to disable DRS for anything that fits your need (just don't make it a rule, but keep it an exception). For example you might want to keep your large VMs in place and re-balance the load using your smaller ones. – Fatman Jul 24 '20 at 12:50
  • For clustered servers, license constraints, etc use affinity rules. Focus on having the ESXi hosts somewhat balanced (you don't want 1 at 90% and others at 60%) without going overboard with it and avoid frequent migrations. "Play" with the settings until you find the right ones for you. – Fatman Jul 24 '20 at 12:50
0

First and foremost, the logic behind why DRS moves something is very complicated, so trying to figure out why it does something is usually the path to madness.

That being said, lowering the aggression setting is what's usually done when DRS is a bit too trigger-happy, unless there's some other obvious underlying issue, like a VM being too close to the maximum configuration of a host (VMware isn't a very happy camper if you assign 90% of host resources to a single VM). The aggression setting doesn't really matter that much, DRS will still kick in regardless if any host becomes too congested, it'll just be less aggressive, obviously. As I stated above, due to so many factors being considered by DRS, the aggression setting isn't really comparable between different environments, usually 3 is a good starting point, but some environments need it to be dropped down a notch or two.

Exclusions are a bit of a different beast, they are best reserved for VMs that don't take too kindly to being moved. An example is hot-standby software that checks if it's peer is online very frequently, I've seen applications that starts to fail over if the hot peer is unresponsive for more than a millisecond. Another application for exclusions are VMs that you want to stay put, a good example is when you have a stretched cluster over multiple datacenters. Then it makes sense to exclude your domain controllers from DRS and manually place them on certain hosts in certain datacenters, so that DRS doesn't get too clever and place them all in the same datacenter.

Stuggi
  • 3,506
  • 4
  • 19
  • 36