0

We have a fairly large VPS running our custom server code (not web servers but game servers), at random times the server seems to stall for a few seconds, how can we track the down the thread / process that is causing it?

Performance monitor can tell us what is stalling, CPU, Hard disk etc but it cannot tell us what is using that hardware at the time of the stall...

Any ideas?

Thanks

  • Have you tried something like [Process Monitor](http://technet.microsoft.com/en-gb/sysinternals/bb896645.aspx)? Also, what sort of specs are the servers and what game(s) is it running? – tombull89 Mar 14 '13 at 10:24

1 Answers1

0

This is complicated by the fact that it's a virtual machine and you're not able to take measurements from the hypervisor. You cannot really get good measurements of what's happening at the physical machine level from inside a VM.

It also depends on exactly what you mean by "processes stall." But there are a few things you can look into to at least get you started.

Get Process Explorer, which is like Task Manager on steroids. Run that, and observe how much CPU is consumed by hardware interrupts and DPCs during one of these events. If it's really high (it shouldn't be more than about 5%) then you're looking at a driver issue. You can then inspect the System process and see the CPU usage for every individual thread in the System process. It will usually have the name of a *.sys file in it, and that will be the driver that's causing the issue.

The second tool I would turn to is Xperf. Xperf is an extremely powerful and flexible system profiling tool. It will tell you what is causing the performance problems on your server if you use it right.

Ryan Ries
  • 55,481
  • 10
  • 142
  • 199
  • Essentially at random times disk use, cpu use etc spike to such a level that is causes our application to drop connections.... we need to know what process is causing this and why. Performance monitor can show us the spikes, but gives no indicator as to what is causing them, so we need to be able to record the servers resources etc and when we come across a spike be able to attribute it to a specific process or application..... we don't seem to be able to do this with windows tools, although we are now checking out xperf as suggested. – Martyn Hughes Mar 18 '13 at 11:10