1

In the last year I've had 2 brand new servers that I commissioned where their performance made them unusable. The both servers were Dell R620 servers with 1 processor. One had 6 cores and the other 8. One had SLES 11SP3 and Oracle, and the other had Windows 2008 R2.

The Windows server was sluggish from the minute I got the OS installed. I was absolutely shocked how slowly it was performing for everything from bootup to app usage, but without showing any obvious symptoms in performance counters for CPU, Disk, Memory, anything. I can't quantify the slowness, but I would have described it as if I had installed the OS on a machine from 10 years ago or older. I finally fixed it by fiddling with the BIOS settings and disabling Hyperthreading. As soon as I turned it off, the server took off. I would estimate a 10x performance increase.

The Linux server was even more odd. With it the server was initially working very well for 3 or 4 weeks. Then, one evening, without any apparent trigger, the CPU usage suddenly went from a flatline at about 4% to a crazy up and down 20-60%. All over the place. At the same time Oracle connection times went from 100ms to 500ms. Overall Oracle performance was so bad that our production processes were affected, and we don't place much of a load on the database. The DBA and I spent 12+ hours and could find nothing to explain the problem. I looked at the system using top and the Gnome system monitor and the CPU traces were completely chaotic up and down from 0-100%. We rebooted several times and boot times were probably 2-3x normal. The problem was finally fixed when, in desperation, I disabled the HT in the BIOS. Magic. All was fixed.

My question is, have others experienced this? I've Googled quite a bit and people talk about relatively small performance impacts, good and bad, but nothing like what I have seen. I am now completely afraid of HT and have been making it a default to disable it on new builds. Is there something else that I'm not understanding that could cause this?

Could this be actual defective hardware?

EDIT: As shodanshok suggested below this could be actually a power profile issue. The problem occurred again today, even with the HT disabled. I went into the BIOS settings and found the power savings settings under "System Profiles". It was defaulted to "Performance per Watt". I changed to "Performance" and the problem is gone again. It is hard to confirm this is a final fix, the reboot alone could have disrupted the problem, but I'm feeling good about this being it. I'll followup again after a bit.

EDIT2: CONFIRMATION. I have seen this problem at least two more times, but on 2 other servers. In all cases it was fixed by changing the "System Profile" to "Performance". I have not seen a recurrence of this problem on any server after making the change.

CactusPCJack
  • 113
  • 5
  • Hyperthreading can reduce performance in some workloads, but not quite like that. Something else must be going on there. – Michael Hampton Feb 14 '15 at 21:49
  • Exactly, this was severe degradation in both cases. I wouldn't even bring it up if it had only happened once or if a specific workload was slightly degraded. This was generalized severe performance degradation in both cases. I'm mystified. – CactusPCJack Feb 14 '15 at 22:23
  • Do you have performance graphs? – ewwhite Feb 14 '15 at 22:38
  • I've seen hypeethreading issues on AMD processors that were unexplainedly llcausing the CPUs to into an aggressive power saving mode, reducing the clock cycle to 800mhz... these were also dell servers a few years ago. – Tim Brigham Feb 14 '15 at 23:02
  • @TimBrigham: Hyperthreading is Intel technology and never was available on AMD processors, there could be no issues with it, since it is not present. – Andrey Sapegin Feb 14 '15 at 23:09
  • 1
    @Andrey indeed you are correct... we ended up replacing the servers with an AMD based model. Drawback of typing on a phone... hard to backcheck – Tim Brigham Feb 14 '15 at 23:57

1 Answers1

3

On latest DELL servers, I found the BIOS-based power saving logic to be quite bad (if not plain broken). Try to disable it, setting the server for maximum performance and to let power saving be under OS control, not BIOS.

Then try to re enable hyperthreading.

shodanshok
  • 47,711
  • 7
  • 111
  • 180