0

Upon transferring some files across server disks I lost connection and the server crashed. Plugging in a Screen showed no response at all. I had to reset the Server, which then boots normally. At first I thought it was Samba, but I verified today that it also happens when issuing move commands via SSH. I'm running Ubuntu Server (x64) off a stick on an Asus C60M-I with 2GB non-ECC memory. I did recently switch the Realtek Ethernet drivers (r8168) from the default ones. (r8169) But I doubt that that has anything to do with it.

Before this all started, SMB transfers would start fast between server drives (30-38MB/sec, one drive is SATA, the other is mentioned below is eSATA), but would then slow down to an unreasonable speed, at which point I canceled the transfer.

After reboot it also appears that I'm getting permission denied errors on all my disks. (when non root) Deluge also cannot write files anymore because of this even tough the mount and all contents are owned by nobody with full access. (ext4)

drwxrwxrwx 5 nobody nogroup  4096 Feb 13 22:40 blastoise
drwxrwxrwx 5 nobody nogroup  4096 Feb 12 02:54 charizard
drwxrwxrwx 4 nobody nogroup  4096 Feb 12 11:15 magikarp
drwxrwxrwx 4 nobody nogroup  4096 Feb 10 02:09 raichu
drwxrwxrwx 4 nobody nogroup  4096 Feb 14 03:35 ratata
drwxrwxrwx 9 nobody nogroup  4096 Feb  8 19:16 voltorb

Here is the Syslog, which seems to point the finger at the first and only partition (ext4) on sdd:

SYSLOG FILE

Then I've checked this drive with SmartMonTools which aside from having had a high temperature at one point (still within the max of 60°C), seems to be doing ok:

SMART STATUS FILE

I'm pretty new at this.

Chetan Bhargava
  • 245
  • 5
  • 15
Zerreth
  • 3
  • 1
  • 1
    can you post the last few lines from dmesg please? Can you also paste in the relevant excerpts from the two log files so that its easier to understand your context? – drone.ah Feb 15 '13 at 16:40
  • `Feb 14 01:39:08 Server kernel: [98194.507411] Buffer I/O error on device sdd1, logical block 302254680` < this a few 1000 times before crash. Last of SMESG: https://www.sugarsync.com/pf/D9249086_64373256_68690 – Zerreth Feb 15 '13 at 16:53

2 Answers2

1

Bad news! It looks like you have a failing disk or disk controller. Try replace sdd and see if that resolves the issue. It it doesn't - then the issue will likely be your disk controller.

drone.ah
  • 482
  • 2
  • 6
1

Every time I've encountered errors like that, the easiest solution was replacing the drive. Basically your system crashes because the entire system is waiting for things to be written to disk, the load goes up, and becomes completely unresponsive. It could be an issue with bad sectors, it could be a failing controller, or a bad ATA connector on the MB, running the sort of tests that can determine the culprit usually mean removing the disk and putting it into something non production.

NickW
  • 10,263
  • 1
  • 20
  • 27