0

I am currently having a problem with a small Linux server that is providing file-sharing services to four Windows 7 32-bit clients. The server is an AMD PhenomX3 with two Western Digital 10EADS (1TB) drives, attached to a Gigabyte GA-MA770T-UD3 mainboard and running Ubuntu Server 10.04.1 LTS.

The client machines are taking an extremely long time to access/transfer data on the file server. Applications often become non-responsive while trying to open files located remotely, or one program attempting to open a file but having to wait will prevent other software from accessing network resources at all. Other examples include one image taking 20 seconds or more to open, and in one instance a user waited 110 seconds for Microsoft Word 2007 to save a document.

I had initially thought the problem was network-related, but this appears not to be the case. All cables and switches have been tested (one cable was replaced) for verification. This was additionally confirmed when closing down all client machines and rebooting the server resulted in the hard-drive light staying on solid during the startup process. For the first 15 minutes during boot, logon and after logging on (with no client machines attached), the system displayed a load average of 4 or higher. Symptoms included waiting several minutes for the logon prompt to appear, and then several minutes for the password prompt to appear after typing in a user name. After logon, it also took upwards of 45 seconds for the 'smartctl' man page to appear after the command 'man smartctl' was issued. After 15 minutes of this behaviour, the load average dropped to around 0.02 and the machine behaved normally.

I have also considered that the problem is hard-drive-related, however diagnostic programs reveal no drive problems. Western Digital DLG, Spinrite and SMARTUDM show no abnormal characteristics - the drives are in perfect health as far as the hardware is concerned.

I have thus far been completely unable to track down the cause of this problem, so any help is greatly appreciated.

Requested Information:

Output of 'free'
hxxp://pastebin.com/mfsJS8HS (stupid spam filter)
The command 'hdparm -d /dev/sda1' reports: HDIO_GET_DMA failed: Inappropriate ioctl for device (the BIOS is set to AHCI - I probably should have mentioned that).

  • This sounds like your Ubuntu is swapping a lot and thus sweating itself almost to death. Please add some more system information to your post: output of `free` and `vmstat 1`, at least. If swapping is not the case, maybe the Windows clients are constantly indexing the network share? – Janne Pikkarainen Nov 08 '10 at 08:10
  • Done. I have supplied the requested information. Swap shows it isn't being used at all. – CruftRemover Nov 08 '10 at 08:51
  • Have you have removed free output from question? When the system is giving slow response run top and see line "Cpu(s): 3.2%us, 2.6%sy, 0.0%ni, 94.2%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st", If percentage before wa is high then it is I/O problem, if other percentages are higher then it could be some process using CPU. Once you give this feed-back we can suggest more tests. – Saurabh Barjatiya Nov 08 '10 at 10:11

3 Answers3

0

If it wasn't a 'modern' Ubuntu I'd say your hard drive has DMA disabled; check with hdparm. If memory serves me well it should be hdparm -d1 but you might want to check the manual page first.

Perhaps the controller you're using for the disk does not have DMA enabled by default, and it's relying on CPU for I/O?

Another thing could be some concurrency on the hard drive; or no more swap space.

lorenzog
  • 2,799
  • 3
  • 20
  • 24
  • The drives are connected using AHCI. The command 'hdparm -d /dev/sda1' reports: HDIO_GET_DMA failed: Inappropriate ioctl for device – CruftRemover Nov 08 '10 at 08:52
  • I see. However you might want to use `hdparm -d /dev/sda` since you're addressing devices, not partitions. However SDA has DMA enabled by default so probably not what you're looking for. – lorenzog Nov 08 '10 at 10:58
0

It sounds like you've made sure that the physical network infrastructure is sound. Have you ruled out a configuration problem? Some things I would try in your situation:

  1. Make sure that other internal resources are accessible so that I could rule out DNS, firewall and the such. For example see if mounting the share by IP address makes the problem go away.
  2. Set up a linux machine as a client and see if the problem is reproduced regardless of the OS
ztron
  • 317
  • 1
  • 8
0

Answering my own question, it turns out that one of the hard drives in the system was indeed faulty, but as I originally pointed out, none of the usual diagnostic programs would detect the fault. Eventually, out of frustration, I copied all critical data off the server and placed the two drives into two identical desktop systems I had been working on and attempted to install Linux. One worked fine, but the other would almost always lock up during formatting of the drive. Replacing the drive that locked during format solved the problem, but I was still unable to determine the specific fault. Western Digital replaced the drive regardless.

  • 1
    Consumer grade hardware is really bad about reporting faults. The drives probably don't report much; the controller almost certainly doesn't report anything to the OS.... There's reasons professionals don't use consumer grade hardware, you've proved at least one of them. – Chris S Jan 01 '11 at 15:24