3

I have been running RHEL 5.7 on a host just fine with several VMs (KVM). No major issues. Time came to upgrade to RHEL 6.1 as a few bugs had been fixed in this release.

When I start a VM or two under RHEL 6.1, the system becomes really sluggish. Even through SSH, keystrokes appear with delay. System resources appear OK, except dstat reports "missed X ticks" (number varies from 1 < 20). I am using virtio on all the guests.

The server has decent hardware (IBM x3850 with 128G RAM).

Is anyone running RHEL 6.1 with KVM successfully?

I've tried it on 2x servers so far and got the same result!

Gman
  • 63
  • 1
  • 6
  • What does syslog say ? – Lucas Kauffman Aug 12 '11 at 09:55
  • Absolutely nothing, unfortunately. The best result I've had so far is using a RHEL 6.0 kernel on 6.1, where the VMs could be started up and the host still responds, but then I have networking issues (that I didn't have a day earlier when the box ran RHEL 5.7). I have set-up a few virtual switches and connected VMs between them. They can ping each other, and they can establish a connection, but no data flows. Again, it was fine with RHEL 5.7 on the same hardware! – Gman Aug 12 '11 at 09:59
  • 1
    I suggest you contact RHEL support about this, this might be a more deeper problem – Lucas Kauffman Aug 12 '11 at 10:09
  • I have. They have escalated it to their Senior Engineers. I figured I may as well look for the solution elsewhere, as there must be others having similar issues... hence ServerFault :) – Gman Aug 12 '11 at 10:18

2 Answers2

4

Maybe this is somehow about ACPI/APIC or kernel clock? I bet kernel in RHEL 6.1 has gained dynamic ticks (or, "tickless kernel") compared to one in RHEL 5.7.

If you run iostat -x 1 at your host, does it report huge number of interrupts during the lag? Interrupt storms, even if rare nowadays, can cause those stalls. Then it might be about ACPI or APIC and disabling those by appending noapic and/or acpi=off parameters to GRUB kernel line in boot menu might help.

If this is about dynamic ticks, passing nohz=off as boot parameter in GRUB might help.

If this is about something else, well, let's hope RHEL engineers can help you. :)

Janne Pikkarainen
  • 31,852
  • 4
  • 58
  • 81
  • Thanks very much Janne! `nohz=off` made a big difference. At least I can start up VMs and the system responds. The only problem left is some sort of networking issue between VMs. 2 VMs belong to the same virtual switch, but they can't send data. They establish the connection OK, but that's as far as the connection goes. I also saw a message on the host that said `hrtimer: interrupt took 130910 ns`. Apart from that, no other interesting messages. – Gman Aug 12 '11 at 13:15
  • Well, well! Would not be the first time. My workstation with Fedora 15 has same kind of bug, but I did not bother filing an RH bug since my desktop does not have HW virtualization support of any kind. Seems to be it's time to investigate this more. – Janne Pikkarainen Aug 12 '11 at 13:23
  • Janne, I'll mark your answer as the correct answer as it certainly helped make the server run much smoother. Thanks very much!! – Gman Aug 15 '11 at 06:32
  • You're welcome. New features such as tickless kernel always seem to ship with new exciting bugs. – Janne Pikkarainen Aug 15 '11 at 07:50
1

Check for a BIOS update for your server. At least Fujitsu released a BIOS update for the server I've used (RX300S6) which included improved support for new Linux kernels. Unfortunately they didn't get into specifics but I think it might have something to do with this.

onik
  • 997
  • 3
  • 7
  • 20
  • Thanks for the suggestion. I've checked the latest BIOS updates and I couldn't see any updates that looked promising. I am currently giving @Janne's suggestions a try. I will apply the updates afterwards (so I know what fixed it, if/when it gets fixed!) – Gman Aug 12 '11 at 12:54