A few days ago a server I manage had a panic, after 400+ days of uptime. I rebooted it and it worked for two days or so, then it hit an "oops: cpu#n stuck for 61s" for various values of n. Rebooted again, and today the original kernel panic appeared again. The trace is (retyping manually, so skipping addresses):
Kernel panic - not syncing: Fatal exception in interrupt
Pid: 0, comm: swapper Tainted: G D 2.6.32-41-server #89-Ubuntu
Call Trace:
<IRQ> panic
oops_end
die
do_general_protection
? consume_skb
general_protection
? put_page
skb_release_data
__kfree_skb
consume_skb
dev_kfree_skb_any
sky2_tx_complete
sky2_status_intr
? __queue_work
sky2_poll
net_rx_action
__do_softirq
? handle_IRQ_event
call_softirq
do_softirq
irq_exit
do_IRQ
ret_from_intr
<EOI> ? mwait_idle
? atomic_notifier_call_chain
? cpu_idle
? start_secondary
RIP put_page
The OS is Ubuntu 10.04.4 x64. Since it has always worked and nothing was changed before the panics, I am thinking about some hardware fault. Before the last reboot I did a full memtest and it passed, as well as a full fsck just to be sure. Since the panic is related to sky2 (marvell network controller) it may be a nic problem? Is there something I have overlooked? Consider that between errors everything is working perfectly (no errors in logs, no dropped packets, no slowdowns).
Thanks for any pointer