-1

On a server heavily used for file download, the server stops responding ssh, http and ping request every few hours. It will be back to normal after server restart.

The provider technician guesses that it could be due to network failure. I am wonder how to investigate and possibly resolve this problem?

Here is the last logs in dmesg log. The server has been restarted twice in the last 24 hours.

[    7.266682] ioatdma 0000:00:16.0: setting latency timer to 64
[    7.266726]   alloc irq_desc for 65 on node -1
[    7.266728]   alloc kstat_irqs on node -1
[    7.266731] alloc irq_2_iommu on node -1
[    7.266736] ioatdma 0000:00:16.0: irq 65 for MSI/MSI-X
[    7.266879] ioatdma 0000:00:16.1: enabling device (0000 -> 0002)
[    7.266882]   alloc irq_desc for 44 on node -1
[    7.266883]   alloc kstat_irqs on node -1
[    7.266886] alloc irq_2_iommu on node -1
[    7.266891] ioatdma 0000:00:16.1: PCI INT B -> GSI 44 (level, low) -> IRQ 44
[    7.266902] ioatdma 0000:00:16.1: setting latency timer to 64
[    7.266936]   alloc irq_desc for 66 on node -1
[    7.266938]   alloc kstat_irqs on node -1
[    7.266940] alloc irq_2_iommu on node -1
[    7.266944] ioatdma 0000:00:16.1: irq 66 for MSI/MSI-X
[    7.267097] ioatdma 0000:00:16.2: enabling device (0000 -> 0002)
[    7.267101]   alloc irq_desc for 45 on node -1
[    7.267103]   alloc kstat_irqs on node -1
[    7.267107] alloc irq_2_iommu on node -1
[    7.267113] ioatdma 0000:00:16.2: PCI INT C -> GSI 45 (level, low) -> IRQ 45
[    7.267126] ioatdma 0000:00:16.2: setting latency timer to 64
[    7.267162]   alloc irq_desc for 67 on node -1
[    7.267163]   alloc kstat_irqs on node -1
[    7.267165] alloc irq_2_iommu on node -1
[    7.267170] ioatdma 0000:00:16.2: irq 67 for MSI/MSI-X
[    7.267307] ioatdma 0000:00:16.3: enabling device (0000 -> 0002)
[    7.267312]   alloc irq_desc for 46 on node -1
[    7.267314]   alloc kstat_irqs on node -1
[    7.267317] alloc irq_2_iommu on node -1
[    7.267324] ioatdma 0000:00:16.3: PCI INT D -> GSI 46 (level, low) -> IRQ 46
[    7.267339] ioatdma 0000:00:16.3: setting latency timer to 64
[    7.267383]   alloc irq_desc for 68 on node -1
[    7.267386]   alloc kstat_irqs on node -1
[    7.267389] alloc irq_2_iommu on node -1
[    7.267395] ioatdma 0000:00:16.3: irq 68 for MSI/MSI-X
[    7.267527] ioatdma 0000:00:16.4: enabling device (0000 -> 0002)
[    7.267531] ioatdma 0000:00:16.4: PCI INT A -> GSI 43 (level, low) -> IRQ 43
[    7.267543] ioatdma 0000:00:16.4: setting latency timer to 64
[    7.267587]   alloc irq_desc for 69 on node -1
[    7.267589]   alloc kstat_irqs on node -1
[    7.267593] alloc irq_2_iommu on node -1
[    7.267599] ioatdma 0000:00:16.4: irq 69 for MSI/MSI-X
[    7.267743] ioatdma 0000:00:16.5: enabling device (0000 -> 0002)
[    7.267746] ioatdma 0000:00:16.5: PCI INT B -> GSI 44 (level, low) -> IRQ 44
[    7.267759] ioatdma 0000:00:16.5: setting latency timer to 64
[    7.267794]   alloc irq_desc for 70 on node -1
[    7.267796]   alloc kstat_irqs on node -1
[    7.267798] alloc irq_2_iommu on node -1
[    7.267803] ioatdma 0000:00:16.5: irq 70 for MSI/MSI-X
[    7.267950] ioatdma 0000:00:16.6: enabling device (0000 -> 0002)
[    7.267955] ioatdma 0000:00:16.6: PCI INT C -> GSI 45 (level, low) -> IRQ 45
[    7.267970] ioatdma 0000:00:16.6: setting latency timer to 64
[    7.268012]   alloc irq_desc for 71 on node -1
[    7.268013]   alloc kstat_irqs on node -1
[    7.268016] alloc irq_2_iommu on node -1
[    7.268021] ioatdma 0000:00:16.6: irq 71 for MSI/MSI-X
[    7.268152] ioatdma 0000:00:16.7: enabling device (0000 -> 0002)
[    7.268157] ioatdma 0000:00:16.7: PCI INT D -> GSI 46 (level, low) -> IRQ 46
[    7.268173] ioatdma 0000:00:16.7: setting latency timer to 64
[    7.268217]   alloc irq_desc for 72 on node -1
[    7.268219]   alloc kstat_irqs on node -1
[    7.268222] alloc irq_2_iommu on node -1
[    7.268228] ioatdma 0000:00:16.7: irq 72 for MSI/MSI-X
[    7.273295] i801_smbus 0000:00:1f.3: PCI INT C -> GSI 18 (level, low) -> IRQ 18
[    7.277431] Monitor-Mwait will be used to enter C-1 state
[    7.277533] Monitor-Mwait will be used to enter C-2 state
[    7.278051] Monitor-Mwait will be used to enter C-3 state
[    7.278131] processor LNXCPU:00: registered as cooling_device0
[    7.278197] input: Power Button as /devices/LNXSYSTM:00/LNXPWRBN:00/input/input7
[    7.278226] ACPI: Power Button [PWRF]
[    7.278892] processor LNXCPU:01: registered as cooling_device1
[    7.279463] processor LNXCPU:02: registered as cooling_device2
[    7.280028] processor LNXCPU:03: registered as cooling_device3
[    7.280564] processor LNXCPU:04: registered as cooling_device4
[    7.283535] processor LNXCPU:05: registered as cooling_device5
[    7.284159] processor LNXCPU:06: registered as cooling_device6
[    7.284768] processor LNXCPU:07: registered as cooling_device7
[    7.285364] processor LNXCPU:08: registered as cooling_device8
[    7.285879] processor LNXCPU:09: registered as cooling_device9
[    7.286595] processor LNXCPU:0a: registered as cooling_device10
[    7.287125] processor LNXCPU:0b: registered as cooling_device11
[    7.287720] processor LNXCPU:0c: registered as cooling_device12
[    7.288295] processor LNXCPU:0d: registered as cooling_device13
[    7.288825] processor LNXCPU:0e: registered as cooling_device14
[    7.289485] processor LNXCPU:0f: registered as cooling_device15
[    7.290069] processor LNXCPU:10: registered as cooling_device16
[    7.290675] processor LNXCPU:11: registered as cooling_device17
[    7.296242] Error: Driver 'pcspkr' is already registered, aborting...
[    7.299964] processor LNXCPU:12: registered as cooling_device18
[    7.300702] processor LNXCPU:13: registered as cooling_device19
[    7.301409] processor LNXCPU:14: registered as cooling_device20
[    7.302091] processor LNXCPU:15: registered as cooling_device21
[    7.302741] processor LNXCPU:16: registered as cooling_device22
[    7.303410] processor LNXCPU:17: registered as cooling_device23
[    7.447430] Adding 8787960k swap on /dev/md1.  Priority:-1 extents:1 across:8787960k 
[    7.502237] loop: module loaded
[    7.660050] EXT4-fs (sdd1): mounted filesystem with ordered data mode
[    7.668827] EXT4-fs (sda3): mounted filesystem with ordered data mode
[    7.669375] EXT4-fs (sdc): Unrecognized mount option "0" or missing value
[    7.824669] ADDRCONF(NETDEV_UP): eth0: link is not ready
alfish
  • 3,127
  • 15
  • 47
  • 71
  • Anything in the logs? In `dmesg`? – Håkan Lindqvist Aug 13 '14 at 05:47
  • @HåkanLindqvist Just added the last 100 lines in dmseg. Not sure where and how to read them, as there is no timestamp. – alfish Aug 13 '14 at 05:53
  • To rule out a bad switch port, flick it over and see what you get. I've seen these messages before but it hasn't been bad. – hookenz Aug 13 '14 at 06:13
  • @alfish That looks to be what was logged during boot. I was hoping for something from when the actual problem occured (either from log files or from `dmesg`). – Håkan Lindqvist Aug 13 '14 at 17:35
  • This is just the boot up log but you have some other errors, too: `EXT4-fs (sdc): Unrecognized mount option "0" or missing value` you should check that. And before I would go to search if my NIC was dysfunctional I would check other things first like: Check cable connections (maybe a cable has been bent too much) and the apache settings because a misconfigured apache can kill your network, too due to network or memory overload (check maxclients for example) Also it can be a DNS-attack. Have a look at open connections with `netstat -platu`, also try tcpdump to look for incoming packets. – Broco Aug 14 '14 at 17:10

1 Answers1

1

It may be worth checking the network device for any NIC and driver errors with the ethtool statistics info:

ethtool -S "ethX"

Just substitute ethX with your NIC.

You can also test the network adapter with the -t argument, although that may disrupt the connection.

sorry - This should be a comment, but I am not allowed to comment yet.

geedoubleya
  • 712
  • 4
  • 10