1

I am configuring an internal build system using Teamcity with VMWare vSphere. Once configured, the build server is supposed to start build agent VMs using vSphere API. I've got to a point where TeamCity build server spins up build agent VMs that I need, but there is a problem.

When build server detects that it needs several agent VMs, it spins up VMs very fast, and those VMs tend to get the same IP address in ~80% of the cases. If I start build agents manually with a small pause in between the calls, the VMs get unique IP address.

The same IP addresses result in many networking issues. Here's a screenshot from 2 build agent VMs with the same IP address. VM IP4 are the same

I think vSphere is using Cisco Meraki box that has DHCP service, but I don't have access to it. I've spent a few days trying to narrow this issue down to the screenshot above, but I am not sure where to go from here. I thought DHCP services supposed to handle this situation just fine, but I must have misconfigured this somehow.

My build agent VM is Ubuntu 20.04 LTS, it did not have DHCP pre-configured when I made a snapshot. I did not run any scripts to prepare the image for snapshotting, it's more or less vanilla Ubuntu with docker installed, all our builds are containerised. I am using cloned VMs, not templates.

Could someone please point me in the right direction?

Greg Askew
  • 35,880
  • 5
  • 54
  • 82
oleksii
  • 266
  • 1
  • 3
  • 11
  • It's down to the Guest OS to ensure it gets a unique IP, even when using VM Guest Customisation that's mostly a Guest VM responsibility - the vSwitch is just that, a L2 switch, it has nothing to do with IP. – Chopper3 Sep 17 '20 at 11:17

2 Answers2

4

My build agent VM is Ubuntu 20.04 LTS, it did not have DHCP pre-configured when I made a snapshot.

It should have.
If the VM had a static IP address when you took a snapshot of it, all clones created from that snapshot will try to use the same network config when they boot. This should not work at all, even if you wait when deploying them.
What I think is happening is, when a VM starts and finds that its IP address is already in use, it automatically switches to DHCP to obtain a new one; but if you start two of them at the same time, they don't detect any IP conflict and just try to use their existing config.
You should configure the base image for DHCP before cloning it.

Massimo
  • 70,200
  • 57
  • 200
  • 323
0

Not sure if this is ideal, but I will describe what I've done to fix this. This is a hack so please don't use it unless all other things have failed

Enable netplan, but disable IPv4 DHCP. For some reason when I made several instances of the same clone some instances got the same IP

sudo nano /etc/netplan/00-installer-config.yaml
---
network:
  version: 2
  renderer: networkd
  ethernets:
    ens160:
      dhcp4: false
---

Add a job to configure IPv4 address at boot time

sudo crontab -e
-- 
@reboot /usr/bin/bash /boot-config.sh > /boot-config.log
--

Create boot config file

sudo touch /boot-config.sh
sudo chown root:root /boot-config.sh
sudo chmod +x /boot-config.sh

It's important to use full path to some programs, as $PATH does not contain /usr/sbin, for example, at boot time when cron calls a script

sudo nano /boot-config.sh
---
#!/bin/bash

echo "*********************"
echo "Boot config script"
echo "*********************"

echo "PATH: " $PATH
echo "Running via " $SHELL
echo "Current working directory " $(pwd)

echo ""
echo "Releasing IP address for ens160"
/usr/sbin/dhclient -v -r ens160

echo "Removing DHCP lease files"
rm /var/lib/dhcp/*

echo "Generating new machine id"
echo "Old id: " $(cat /etc/machine-id)
rm /etc/machine-id
systemd-machine-id-setup
echo "New id: " $(cat /etc/machine-id)

echo "Requesting new IP"
/usr/sbin/dhclient -v  ens160

echo ""
echo "Finished"
---

# You should be able to see the result in the log file
sudo cat /boot-config.log

PS: marking this as the answer, but https://serverfault.com/a/1034226/82856 was helpful to troubleshoot the problem.

oleksii
  • 266
  • 1
  • 3
  • 11