Debian 10 cloud-init waiting for DHCP on boot with static network configuration

Question

Running Debian 10 Buster image (created with build-openstack-debian-image --release buster) with cloud-init image created by cloud-localds -v --disk-format raw --filesystem iso9660 --network-config=network-config-v2.yaml seed.img user-data.yaml.

Problem is that boot is delayed by waiting for DHCP, although I have a valid network configuration and it's applied after this delay.

[    3.619937] cloud-init[210]: Cloud-init v. 20.2 running 'init-local' at Sun, 10 Jan 2021 10:50:20 +0000. Up 3.40 seconds.
[  OK  ] Started Initial cloud-init job (pre-networking).
[  OK  ] Reached target Network (Pre).
         Starting Raise network interfaces...
[  OK  ] Started ifup for eth0.
[     *] A start job is running for Raise network interfaces (35s / 5min 1s)

What can I do to skip this delay?

I can provide more info if needed. Thanks.

# systemd-analyze blame
     1min 2.639s networking.service
           951ms cloud-init-local.service
           773ms cloud-init.service
           657ms cloud-final.service
           540ms cloud-config.service
           421ms dev-vda1.device
           310ms ifupdown-pre.service

My network-config-v2.yaml:

version: 2
renderer: networkd
ethernets:
  eth0:
    match:
      name: e*
    addresses:
      - private.ipv4/24
      - public.ipv4/32
      - ipv6/64
    gateway4: private.ipv4
    routes:
      - to: 0.0.0.0/0
        via: private.ipv4
    gateway6: ipv6
    nameservers:
      addresses:
        - ipv4
        - ipv6
      search: [domain.com]

Hard to tell without seeing your network configuration. Entirely plausible that you have valid network configuration for IPv6 but failed to specify that cloud-init should not attempt (and wait if not immediately successful) to configure IPv4 via DHCP. — anx, Jan 10 '21 at 20:45
@anx I've added network configuration. I hope both IPv4 and IPv6 are configured properly. I've also tried dhcp4 false (and according to documentation, it's Off by default). — Xdg, Jan 12 '21 at 09:32

donhector · Answer 1 · 2022-04-13T14:27:29.040

Great pointers by @zany

In my case I was trying to configure a Debian 11 generic cloud image with cloud-init and a static IP on my KVM host (using dmacvicar libvirt Terraform provider)

My network-config file was:

version: 2
ethernets:
  ens3:
    dhcp4: false
    addresses: [10.1.0.100]
    gateway4: 10.1.0.1
    nameservers:
      addresses: [10.1.0.1 1.1.1.1]
      search: [home.lab]

Then I was surprised that during VM creation, the interface was requesting a DHCP lease (journalctl is your friend) before cloud-init config would actually kick in and configure the interface as per my static settings (exacltly like the OP described)

After a minute or so, the "mysterious" dhclient gave up waiting for an offer (that was expected as DHCP is disabled on my libvirt network) and was then left running in the background. Then the boot sequence continues and cloud-init kicks in, rendering the correct static config in /etc/network/interfaces.d/50-cloud-init.cfg. At that point, the interface gets the expected static IP (ip address show proves that, and you can also ping things by IP), however is leaving DNS resolution broken. I guess it's a side effect of the dhclient fiasco.

Well after some digging it turns out the /etc/network/interfaces file, in addition to sourcing source-directory /etc/network/interfaces.d it also sources the extra directory /run/network/interfaces.d/. To my surprise, that /run directory contains an interface definition for ens3 where it is being configured in dhcp mode!

So now that I knew where the unexpected dhcp request was coming from, it was a matter of disabling it, since it was conflicting with the correct settings in /etc/network/interfaces.d/50-cloud-init.cfg.

Unfortunatelly disabling the intial dhcp request happens before cloud-init kicks in, so really no easy way to prevent dhclient wasting a precious minute or so trying to get an offer that will never come.

What I was able to accomplish though, was fixing DNS resolution by using the following bootcmd: block in my user-data

bootcmd:
  - cloud-init-per once down ifdown ens3
  - cloud-init-per once bugfix rm /run/network/interfaces.d/ens3
  - cloud-init-per once up ifup ens3

In the above commands, I'm bringing the interface down which stops the dormant dhclient process, then I'm removing the interface definition file that initially sets ens3 in dhcp mode, and finally I'm bringing the ens3 interface back up, which applies what's set in /etc/network/interfaces.d/50-cloud-init.cfg like a champ.

With that, the subsequent cloud-init stages in the initial boot process were now able to fully reach the internet by name. That was critical for the later stages such the packages: block to succeed, since it needed DNS working to resolve the apt repo server name.

Here's the more detailed user-data excerpt:

bootcmd:
  - cloud-init-per once ifdown ifdown ens3
  - cloud-init-per once bugfix rm /run/network/interfaces.d/ens3
  - cloud-init-per once ifup ifup ens3

packages:
  - qemu-guest-agent
  - locales-all

package_update: true
package_upgrade: true
package_reboot_if_required: true

runcmd:
  - [ systemctl, start, qemu-guest-agent ]

final_message: "The system is finally up, after $UPTIME seconds"

Despite not being on Debian10, the issue sounded so familiar that thought I'd share my experience in case you face this issue in newer releases.

References:

Thanks for the comprehensive write up. It helped me out. – Beans Apr 23 '23 at 01:05 — Beans, Apr 23 '23 at 01:05

zany · Answer 2 · 2021-03-13T10:23:13.260

I encountered the sample problem -- using a static network configuration (NoCloud provider meta-data ENI, or network-config v1/v2) does not disable the DHCP client.

Seems a network config is applied from a template (/etc/network/cloud-interfaces-template) before the cloud-init configuration is written.

auto $INTERFACE
allow-hotplug $INTERFACE

iface $INTERFACE inet dhcp

You can test that this template is the culprit by changing the cloud-image before first start:
(patching the image as changing network config in e.g. bootcmd is too late.)

qemu-nbd --connect=/dev/nbd0 /tmp/debian-10-genericcloud-amd64-20210208-542.qcow2
fdisk /dev/nbd0 -l
mkdir /tmp/nbd
mount /dev/nbd0p1 /tmp/nbd
sed -i 's/dhcp/manual/' /tmp/nbd/etc/network/cloud-interfaces-template
umount /tmp/nbd
rmdir /tmp/nbd
qemu-nbd --disconnect /dev/nbd0

I still need to find a way to apply this change or prevent the use of this template with cloud-init though.

That template seems to be processed by /etc/network/cloud-ifupdown-helper, so that script could be changed or influenced perhaps.

score 1 · Answer 3 · answered Jun 30 '21 at 04:11

I met the same problem.

Here is a better way to resolve it, just set DHCP timeout to a shorter time.

# virt-edit debian-10-generic-amd64.qcow2 /etc/dhcp/dhclient.conf

timeout 15;

Then this image can function correctly in NoCloud environment or DHCP network.

score 0 · Answer 4 · answered Oct 12 '22 at 09:39

Thanks to the great pointers from other answers here, I managed to find a good solution for me - I was not happy to just let DHCP keep running on each boot as it took about 5 minutes(!) for dhcp to argue with my ISP about how many leases I'm allowed to hold!

By adding this to my cloud-init user data it removes any previously-created config as well as disabling the udev rule that calls the debian helper in the first place:

bootcmd:
  # fix the udev rule that ends up creating the bogus config
  - cloud-init-per always fix-debian-autonet rm /etc/udev/rules.d/75-cloud-ifupdown.rules
  # delete any bogus configs already created by previous boots.
  - cloud-init-per always fix-debian-netconfig rm /run/network/interfaces.d/*

I used always since I wanted to fix my existing instance, but it's probably sufficient to use once and only use the first entry, depending on when this gets run in the boot sequence (test results welcome).

I was finding that because I rename my interfaces with the network config, it was causing extra issues with duplicate interface definitions as well. This change means my bootup time has gone from 5 minutes to 12 seconds!

xycainoff · Answer 5 · 2023-02-16T10:26:33.067

0

Answer by agittins seems best, but in my case cloud-init "bootcmd" command in user-data file processed after debians "75-cloud-ifupdown.rules". So i had to remove that debian scripts in disk image (mount vm storage first, delete script and unmount then):

sudo qemu-nbd --connect=/dev/nbd0 debian-11-genericcloud-arm64-backing.qcow2

sudo mount /dev/nbd0p1 /mnt

sudo rm -v /mnt/etc/udev/rules.d/75-cloud-ifupdown.rules

sudo rm -v /mnt/etc/network/cloud-ifupdown-helper

sudo rm -v /mnt/etc/network/cloud-interfaces-template

sudo umount /mnt

edited Feb 16 '23 at 10:26

answered Nov 23 '22 at 21:51

xycainoff

1
1

As it’s currently written, your answer is unclear. Please [edit] to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Dec 07 '22 at 16:47

score 0 · Answer 6 · answered Apr 12 '23 at 13:33

As @donhector has found, this issue still affects the latest Debian 11 cloud images, so has been left unfixed by them for well over 2 years.

It is possible to use the bootcmd methods given by him or @agittins, but that means that either the first boot takes ages, or you are fighting the udev script every time. It would be possible to combine these methods, but the net result is that you just end up with the /etc/udev/rules.d/75-cloud-ifupdown.rules file deleted from your image.

So, as @xycainoff did, you can just delete the file from the image before using it, and the problem goes away. As a slightly nicer alternative to qemu-nbd and mounting the filesystem, it is possible to use guestfish to remove the file. The process is basically the same as the guestfish example:

https://docs.openstack.org/image-guide/modify-images.html

As the Debian cloud image doesn't use LVM, the mount is simpler - you just need to use mount /dev/sda1 /, rather than an LVM volume name. Then, just remove the /etc/udev/rules.d/75-cloud-ifupdown.rules file with a variation on point 3 in the example.

Debian 10 cloud-init waiting for DHCP on boot with static network configuration

6 Answers6