I'm working on setting up a server with KVM/QEMU and all Linux servers. We are going to use this server for web development, git, VoIP PBX, etc. (We were using XenServer and Windows Server 2016, but I'm a Linux fan.) I've come across some issues with virtual machines seemingly randomly losing network connection or going to sleep or something like that. I can't seem to pin down what the issue is.
I've looked through a lot of forums and posts even here on Server Fault, but nothing quite fits what I'm trying to do. I'll attach an image below of our network setup. We have 2 locations, and a VPN between them with firewalls. The machine in question is a Dell PowerEdge R710. I've successfully installed Ubuntu 18.10 and KVM/QEMU on it as a host OS (18.10 because of an issue with Virtual Manager not showing all network connections in 18.04.) I use Virtual Manager to manage installing/monitoring new VMs from my laptop (Dev Computer 1) over ssh.
I have 6 guest VMs all installed with either Ubuntu 18.04 or Debian 9 (our VoIP PBX) and they all work great except for the occasional network hiccup. All are connected through a bonded bridge in the host machine (including the host itself). There are 4 NICs all bonded and I've used the bond as an interface for the bridge. I'm using netplan for the network configuration and I'll post the config yaml below. I'm using static IP configurations for all the guest VMs that simply set an IP for the default "ens3" interface through netplan, but I can post that too if it will help.
Some interesting things I've noticed:
- I can always ssh into the host machine, it never seems to lose connection.
- When one of the 6 machines loses network connection, I can still ssh into it from the host machine, but it will sometimes hang for a bit while reestablishing connection.
- If I ssh into the offending VM from the host and do a ping to the gateway (firewall) it will snap out of it and we can connect to it again.
- Occasionally the guest VMs will be unable to see each other, but if I ssh into whichever can't see the other and run a ping it will usually start working after a few "Destination Host Unreachable" messages.
I can get any other command outputs or logs that would be necessary to further diagnose this, and I'd really appreciate anyone who may know more about this looking into it. I'm a huge Linux fan, and want this to work the way I know it can, but these random disconnects are not making this solution look very good. Thanks to any who take time to read this!
Host machine netplan configuration:
network:
version: 2
renderer: networkd
ethernets:
eno1:
dhcp4: false
dhcp6: false
eno2:
dhcp4: false
dhcp6: false
eno3:
dhcp4: false
dhcp6: false
eno4:
dhcp4: false
dhcp6: false
bonds:
bond0:
interfaces:
- eno1
- eno2
- eno3
- eno4
addresses: [192.168.5.20/24]
dhcp4: false
gateway4: 192.168.5.1
nameservers:
addresses: [192.168.1.6,1.1.1.1]
bridges:
br0:
addresses: [192.168.5.21/24]
dhcp4: false
gateway4: 192.168.5.1
nameservers:
addresses: [192.168.1.6,1.1.1.1]
interfaces:
- bond0