Why didn't my Linux server start with an active network connection?

Question

I booted my Linux server today via WOL. When it came up, I couldn't SSH onto it. I checked the server and it had booted into the OS but wasn't reachable on the network. I checked my router which showed it as an active client (this may have been a hangover from it being up a few minutes previous) but with no connections. So, I rebooted it locally and the next time it booted up as normal with an active network connection. I performed a grep on dmesg for 'eth' and compared it to the successful boot. See below:

Boot with networking:

[    1.331587] skge 0000:01:04.0: eth0: addr 00:0e:a6:15:17:76
[    1.353667] forcedeth: Reverse Engineered nForce ethernet driver. Version 0.64.
[    1.353930] forcedeth 0000:00:04.0: PCI INT A -> Link[APCH] -> GSI 22 (level, high) -> IRQ 22
[    1.353937] forcedeth 0000:00:04.0: setting latency timer to 64
[    1.872912] forcedeth 0000:00:04.0: ifname eth1, PHY OUI 0x732 @ 1, addr 00:0e:a6:15:0e:a1
[    1.872917] forcedeth 0000:00:04.0: timirq lnktim desc-v1
[   16.614650] eth1: no link during initialization.
[   16.615258] ADDRCONF(NETDEV_UP): eth1: link is not ready
[   16.649234] skge 0000:01:04.0: eth0: enabling interface
[   16.668500] ADDRCONF(NETDEV_UP): eth0: link is not ready
[   18.416816] skge 0000:01:04.0: eth0: Link is up at 100 Mbps, full duplex, flow control both
[   18.417081] ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[   28.592014] eth0: no IPv6 routers present

Boot without networking:

[    1.293152] forcedeth: Reverse Engineered nForce ethernet driver. Version 0.64.
[    1.293484] forcedeth 0000:00:04.0: PCI INT A -> Link[APCH] -> GSI 22 (level, high) -> IRQ 22
[    1.293491] forcedeth 0000:00:04.0: setting latency timer to 64
[    1.353544] skge 0000:01:04.0: bad (zero?) ethernet address in rom
[    1.354130] skge 0000:01:04.0: eth0: addr 00:00:00:00:00:00
[    1.812906] forcedeth 0000:00:04.0: ifname eth1, PHY OUI 0x732 @ 1, addr 00:0e:a6:15:0e:a1
[    1.812911] forcedeth 0000:00:04.0: timirq lnktim desc-v1
[   17.384526] eth1: no link during initialization.
[   17.396719] ADDRCONF(NETDEV_UP): eth1: link is not ready

The obvious culprit seems to be the line [ 1.353544] skge 0000:01:04.0: bad (zero?) ethernet address in rom but I don't know what would cause this situation to occur.

Here's some of the output around this line:

[    1.353449] ACPI: PCI Interrupt Link [APC2] enabled at IRQ 17
[    1.353456]   alloc irq_desc for 17 on node -1
[    1.353459]   alloc kstat_irqs on node -1
[    1.353470] skge 0000:01:04.0: PCI INT A -> Link[APC2] -> GSI 17 (level, high) -> IRQ 17
[    1.353533] skge: 1.13 addr 0xe8008000 irq 17 chip Yukon-Lite rev 7
[    1.353544] skge 0000:01:04.0: bad (zero?) ethernet address in rom
[    1.354130] skge 0000:01:04.0: eth0: addr 00:00:00:00:00:00

Has anyone any suggestions?

In case it's pertinent, the Linux server has a static IP defined on the router. When the server starts up, it should request an IP address from the DHCP server which in this case it my router and it should always get the IP address 192.168.2.103.

Does your WOL command framework include any PXE or DHCP elements? — BMDan, Mar 23 '11 at 21:34
The router has a static IP address defined for the Linux hosts MAC. The WOL command is sent to `255.255.255.255` — conorgriffin, Mar 23 '11 at 21:56
obviously not a common issue, can anyone even throw some speculation out there. My own feeling is that maybe it's a BIOS issue. — conorgriffin, Mar 29 '11 at 21:34
Any luck from hardcoding a MAC via `MACADDR=` in the ifcfg file? (Note that `HWADDR=`, despite its resemblance to the ifconfig command's "hwaddr" parameter, is quite different, though it may also be worth trying depending on how weird this all is.) Also, once the machine has booted, what's the output of `lspci`? — BMDan, Apr 12 '11 at 12:51

score 1 · Answer 1 · answered Mar 29 '11 at 21:52

1

This has all the hallmarks of a driver-related problem. Perhaps this motherboard is too new for the kernel version of your distro-of-choice. It also looks like a timing issue of some kind relating to when certain modules load into the kernel, the skge lines in the non-working output occur a half second sooner than the ones in the working one, and perhaps that's where things are failing.

answered Mar 29 '11 at 21:52

sysadmin1138

133,124
18
176
300

It's not that the hardware is new anyway. The hardware is quite old, 2002/3 I think. The kernel is up to date, 2.6.35-28-generic. So it may be that the hardware is too old, I'd be surprised if that was the case though – conorgriffin Mar 30 '11 at 14:26
@griffo This being Linux, it could be that you've lucked into a bit of hardware that is relatively rare and no one bothered to fix the bugs for it. – sysadmin1138 Mar 30 '11 at 17:37
Yeah maybe, although this seems to have started happening only recently. Maybe a bug introduced with a new kernel/patch. – conorgriffin Apr 02 '11 at 15:40

Why didn't my Linux server start with an active network connection?

1 Answers1