1

I just worked on a local Linux server in my office, connecting to it via SSH. I changed some network settings. Specifically, I added a simple network bridge that replaced the previous ethernet connection (eth0). In both cases, the network address is a static IPv4 address.

After I did those changes and restarted the network daemon using systemctl restart systemd-networkd, I was locked out and could not ssh back into the machine.

Luckily, I had access to the physical console. While restarting the network did give me the new bridge with the correct address, it did not remove the address from eth0 - even though all configuration settings are correct. So, I had to manually ip a flush eth0, and I was back up and running.

I guess if that would have been a root server in a remote location instead of a local machine, I would look very old now.

What should I have done differently, what is the right approach here?

Update: From the two provided answers so far I see that I should have been clearer. I am fully aware of all the hardware options how to retain access to my station. Because I have and use them, I feel comfortable to invoke certain changes at the risk of something bad happening. It's a bit of a hassle but I can just login through the serial console and all's well again. But I was wondering, what if I didn't have them, how would the rest of you go about changing network settings that can theoretically disconnect you?

And frankly, I also just wonder in a very concrete way why my eth0 interface kept the old IP address even though I restarted the network service with new settings? That just doesn't look like desired behaviour to me.

vic
  • 973
  • 1
  • 10
  • 22

2 Answers2

4

There are at least two ways to do this differently:

  1. Remote console (HP ILO, DELL DRAC, ...) allowing you access via its own NIC and its own IP, which is independant from the main OS settings. If you err you can just 'remotely take the console' and fix things.
  2. Set up a reboot to a safe working state on a timer. Make your changes, then kill the safety timer.

E.g.

sleep 15*60 && shutdown -r +NOW "I messed up. Rebooting"  

(On a new shell)
ifconfig / ip whatever

Then with a working changed state cancel the reboot.

PS1: Sleep and shutdown used in order not to spam users. (though you coulkd just shutdown -t 15m and later cancel the shutdown.

PS2: Notice the sleep && shutdown and not sleep ; shutdown.

Hennes
  • 4,842
  • 1
  • 19
  • 29
  • You can add some minutes before the reboot an command, which should restore the stable networking. Do not forget to schedule the reboot in case your restore command fails for some reason. – allo Oct 24 '15 at 21:27
2

The least intrusive ways to deal with such a problem are those which do not require a reboot.

A serial console is one way to get access. Other more specialized hardware also exists for getting access to a host with no functional networking.

If you don't have any such out-of-band access, it is worth trying alternative means of communicating with the host through the network. The first alternative mean is to take advantage of dual stack giving you a bit of redundancy. If you mess up your IPv4 configuration, you might still be able to reach the host through IPv6, and vice-versa.

If you mess up both IPv4 and IPv6 you may still be able to reach the host through IPv6 link-local communication. The way IPv6 link-local communication works make it a bit more robust against misconfigured networking, so there is a good chance it will work. This method only works if you have access to at least one other functional host on the same network segment as your target.

The more intrusive ways to deal with the problem is to reboot. Even if you don't have the hardware for full remote access, you may still have the hardware to trigger a reboot remotely. This could be achieved through hardware that can trigger the reset line on the mother board or through hardware that power cycles the host.

If no hardware for out-of-band administration exists on the host, you may need to ask on-site staff to help. In such cases it is certainly easier to ask them to reboot the machine than to ask them to debug the network connectivity.

Once the machine reboots you need to somehow ensure that it actually comes back online. If the bad change was only in memory and rebooting will get back to a known good configuration, no special care may be needed. In more problematic cases it may be useful to have the host configured to attempt PXE boot and only boot from local disk if no PXE server is present on the network. However this approach is only sensible if you know that you can trust the network.

The most intrusive is to apply whatever procedures you have in place to handle the situation where the host is completely lost. Those procedures are usually intended for hardware failure or even worse that the building has burned to the ground. But they could be applied for something as trivial as a misconfigured network. (As intrusive as this approach is, it is rarely the preferred solution.)

kasperd
  • 30,455
  • 17
  • 76
  • 124
  • I guess my answer is close to your 'most intrusive'. So SF posters I do assume ILO or DRAC availability though. – Hennes Oct 27 '15 at 23:17