0

i would like to ask for help since i am feeling a bit clueless here.

I recently bought a low end dedicated machine that was supposed to host some services: squid, proftpd and rtorrent.

I installed debian lenny and immediately updated to squeeze and configured the services. I started rtorrent but after the machine reaches heavy load ( > 10MBps network traffic, maxed out cpu), it holds for a while then all network connections drop and i have to order a hard reset in order to bring it back online.

I thought it was a misconfiguration issue, so i tried reconfiguring the server and installing ubuntu 10.04 on it, but i'm getting the same results.

I had a look at /var/log/kernel.log and on ubuntu i am seeing some "Clocksource tsc unstable" messages right before the machine crashes.

I can see the same kind of messages on squeeze aswell, just not that close to the reboots as they were on ubuntu. Google tells me they might have something to do with cpu frequency scaling. There's loads of reports by users like me who are experiencing random freezes. Seems though that there is no clear answer: people solved the issue with video card driver updates, replacing bad hardware, changing the frequency scaling governor, and so on.

So far i only played around with the frequency scaling governor, setting it to "performance" seems to freeze the machine quicker than with the default "ondemand".

Here are the cpu specs of the machine:

# cat /proc/cpuinfo
processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 15
model           : 39
model name      : AMD Athlon(tm) 64 Processor 3700+
stepping        : 1
cpu MHz         : 2200.000
cache size      : 1024 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx mmxext fxsr_opt
lm 3dnowext 3dnow up pni lahf_lm
bogomips        : 4398.97
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp tm stc

# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_governors
conservative userspace powersave ondemand performance

# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_frequencies
1000000 1800000 2000000 2200000

# cat /sys/devices/system/clocksource/clocksource0/available_clocksource
acpi_pm

I asked the datacenter support to perform a hardware check and they tested the machine for 8 hours without errors.

Now. How can i find out what's going on with this server? I'm pretty sure its faulty hardware, but i have no proof to show to the datacenter support.

I am currently running squeeze 2.6.32-5-686-bigmem. The machine has 1024MB of ram and 2x160Gb Sata HDDs. The NIC is a 100MBit realtek one, with proper drivers from the firmware-realtek debian package.

I would love to have some opinions on how to deal with this.

user9517
  • 115,471
  • 20
  • 215
  • 297
D4rKr0W
  • 41
  • 3

2 Answers2

0

You should collect some statistics, probably using Cacti or Nagios w/PNP4Nagios or NagiosGrapher. Probably a load on the server was so high it just got to unresponsive state. This behaviour is not caused by problems in kernel or in environment since too much load is a problem by itself. Maybe you should find proper resource usage limits.

Alex
  • 7,939
  • 6
  • 38
  • 52
  • I will set up a remote syslog server, since it seems like a decent solution in dealing with this kind of issues. I still don't think that much load should kill the machine and leave it in that state until a hard reset. It's an old machine, but i've set up older ones and those were just crunching the workload out, becoming really slow, but never unresponsive. – D4rKr0W Feb 08 '11 at 13:51
0

Have you tried capping the resources that rtorrent uses?

I had a similar problem and I was able to alleviate my problems by limiting the amount of memory rtorrent uses.

In the rtorrent config, the parameter is max_memory_usage and is in bytes.

For example, I set mine to:

max_memory_usage = 268435456
David
  • 3,487
  • 26
  • 20