Great pointers by @zany
In my case I was trying to configure a Debian 11 generic cloud image with cloud-init and a static IP on my KVM host (using dmacvicar libvirt Terraform provider)
My network-config file was:
version: 2
ethernets:
ens3:
dhcp4: false
addresses: [10.1.0.100]
gateway4: 10.1.0.1
nameservers:
addresses: [10.1.0.1 1.1.1.1]
search: [home.lab]
Then I was surprised that during VM creation, the interface was requesting a DHCP lease (journalctl
is your friend) before cloud-init config would actually kick in and configure the interface as per my static settings (exacltly like the OP described)
After a minute or so, the "mysterious" dhclient gave up waiting for an offer (that was expected as DHCP is disabled on my libvirt network) and was then left running in the background. Then the boot sequence continues and cloud-init
kicks in, rendering the correct static config in /etc/network/interfaces.d/50-cloud-init.cfg
. At that point, the interface gets the expected static IP (ip address show
proves that, and you can also ping things by IP), however is leaving DNS resolution broken. I guess it's a side effect of the dhclient fiasco.
Well after some digging it turns out the /etc/network/interfaces
file, in addition to sourcing source-directory /etc/network/interfaces.d
it also sources the extra directory /run/network/interfaces.d/
. To my surprise, that /run
directory contains an interface definition for ens3
where it is being configured in dhcp
mode!
So now that I knew where the unexpected dhcp request was coming from, it was a matter of disabling it, since it was conflicting with the correct settings in /etc/network/interfaces.d/50-cloud-init.cfg
.
Unfortunatelly disabling the intial dhcp request happens before cloud-init kicks in, so really no easy way to prevent dhclient wasting a precious minute or so trying to get an offer that will never come.
What I was able to accomplish though, was fixing DNS resolution by using the following bootcmd:
block in my user-data
bootcmd:
- cloud-init-per once down ifdown ens3
- cloud-init-per once bugfix rm /run/network/interfaces.d/ens3
- cloud-init-per once up ifup ens3
In the above commands, I'm bringing the interface down which stops the dormant dhclient process, then I'm removing the interface definition file that initially sets ens3
in dhcp mode, and finally I'm bringing the ens3
interface back up, which applies what's set in /etc/network/interfaces.d/50-cloud-init.cfg
like a champ.
With that, the subsequent cloud-init stages in the initial boot process were now able to fully reach the internet by name. That was critical for the later stages such the packages:
block to succeed, since it needed DNS working to resolve the apt repo server name.
Here's the more detailed user-data
excerpt:
bootcmd:
- cloud-init-per once ifdown ifdown ens3
- cloud-init-per once bugfix rm /run/network/interfaces.d/ens3
- cloud-init-per once ifup ifup ens3
packages:
- qemu-guest-agent
- locales-all
package_update: true
package_upgrade: true
package_reboot_if_required: true
runcmd:
- [ systemctl, start, qemu-guest-agent ]
final_message: "The system is finally up, after $UPTIME seconds"
Despite not being on Debian10, the issue sounded so familiar that thought I'd share my experience in case you face this issue in newer releases.
References: