VSwitch confused with VMs in same subnet but on different VLANs

Question

In a multi-tenant virtualised LAN environment it is likely that different customers will have VMs in the same subnet (eg 192.168.1.0/24) and be hosted on a single virtual server environment. The overlapping subnets are not a problem for the customers because they do not need to communicate with each other. But when these servers are virtualised on the same virtual server environment (eg Vmware or Sparc) they are no longer able to send traffic out on the wire even though the two servers are on different VLANs (only the initial ARP request is seen). I would have expected that VLAN separation would allow the VSwitch to handle the traffic for each VM separately despite the overlap. A physical switch does this by considering each VLAN to be a separate Layer-2 domain and does not use IP header information in it's forwarding decision. However there must be something (possibly defined in the standards) that causes the VSwitch to ignore the different VLANs in it's forwarding decision and instead inspect the IP header and make the false assumption that it can satisfy the connectivity requirement by switching locally. Does anyone know if there is a standards-based reason why this happens or has experience of this situation and identified a root cause?

score 4 · Answer 1 · answered Apr 13 '18 at 18:07

"In a multi-tenant virtualised LAN environment it is likely that different customers will have VMs in the same subnet"

Only if incorrectly designed, built and secured, why on earth would anyone allow that to happen?

"and be hosted on a single virtual server environment"

If you mean all the servers would be running the same hypervisor then sure, if you mean you think they all run on one server then that's almost certainly incorrect.

But when these servers are virtualised on the same virtual server environment (eg Vmware or Sparc) they are no longer able to send traffic out on the wire even though the two servers are on different VLANs (only the initial ARP request is seen)

Again this would only be the case if the system was designed, built or secured incorrectly.

I would have expected that VLAN separation would allow the VSwitch to handle the traffic for each VM separately despite the overlap

Again it should/can/could if done right.

A physical switch does this by considering each VLAN to be a separate Layer-2 domain and does not use IP header information in it's forwarding decision. However there must be something (possibly defined in the standards) that causes the VSwitch to ignore the different VLANs in it's forwarding decision and instead inspect the IP header and make the false assumption that it can satisfy the connectivity requirement by switching locally

Nope, unless you're using VMware's NSX/NSX-T or have gone out of your way to force the vswitch to deal with layer-3 traffic then you only get a plain L2 virtual switch that doesn't even know you're talking IP at all.

Does anyone know if there is a standards-based reason why this happens or has experience of this situation and identified a root cause?

I think this is a knowledge gap, either with yourself or whoever does your virtualisation design/implementation. If you have two VLANs/Port-Groups on your vSwitch they absolutely can have the same IP traffic going on in them. It'll switch traffic between vNICs in either VLAN/Port-Group but not between them. Obviously your uplinks need to end up somewhere, typically a L3 switch or router and at that point you'll have a problem as you'll have duplicate IPs and you'd have to work around that via NAT or similar but this design is perfectly workable. Where and how do you expect to deal with the range duplication?

The router does not have problems with the overlapping subnets because they are in separate VRFs (Each customer has a separate VRF) — Peter Smallwood, Apr 13 '18 at 19:01

score 1 · Answer 2 · answered Apr 13 '18 at 18:28

"In a multi-tenant virtualised LAN environment it is likely that different customers will have VMs in the same subnet (eg 192.168.1.0/24) and be hosted on a single virtual server environment."

It is definitely possible to have this kind of layer 3 setup in a Multi-Tenant environment. However, you must ensure the upstream switches and firewalls can handle the same subnets being routed through those devices. For example, CustomerA could have VLAN 10 with the subnet of 10.10.10.1/24 and CustomerB could have VLAN 11 with the subnet of 10.10.10.1/24. VMware will not care and won't have a problem doing that (or, alternatively have a NSX environment with virtual wires configured for this setup). However, your upstream switches/routers/firewalls must be able to accommodate that setup. An example of this is using VMware's vCloud Director with NSX. Each tenant has full control of their own subnets and many of them may overlap. Using NSX virtual wires, VMware keeps these networks isolated from each other. Provided the upstream switches either support VXLAN or are strictly layer 2, this works fine. (Source: I have worked on this kind of configuration for years).

"The overlapping subnets are not a problem for the customers because they do not need to communicate with each other. But when these servers are virtualised on the same virtual server environment (eg Vmware or Sparc) they are no longer able to send traffic out on the wire even though the two servers are on different VLANs (only the initial ARP request is seen)."

Sounds like a network issue, not an issue at the virtualization layer.

"I would have expected that VLAN separation would allow the VSwitch to handle the traffic for each VM separately despite the overlap. A physical switch does this by considering each VLAN to be a separate Layer-2 domain and does not use IP header information in it's forwarding decision. However there must be something (possibly defined in the standards) that causes the VSwitch to ignore the different VLANs in it's forwarding decision and instead inspect the IP header and make the false assumption that it can satisfy the connectivity requirement by switching locally. Does anyone know if there is a standards-based reason why this happens or has experience of this situation and identified a root cause?"

The vSwitch does handle traffic for each vSwitch separately, provided the portgroups are VLAN tagged appropriately. However, again, the upstream switches must accommodate that. For example, if your switching is only Layer 2 connectivity, what you put on for Layer 3 in your virtualization will be up to you. Chances are your switching and routers are Layer 3, which means your hypervisors aren't the only thing segregating those subnets and causing you issues. If that is the case, you must use separate subnets and your network team should be able to help set that up for you.

Thanks for your informative answer. In my example: Layer 3 is on the external switches where segregation between the subnets is ensured by the use of different VRFs. I am encouraged to hear of your experiences and I will seek to identify a network problem somewhere. — Peter Smallwood, Apr 13 '18 at 20:41

VSwitch confused with VMs in same subnet but on different VLANs

2 Answers2