1

I programmed a web chat, which displayed new entries by refreshing a via an ajax command. This command launched a php file which then created the chat log showing the latest 25 entries. Ajax refreshed this every second for every user.

Then, when there were around 10 users online, the whole website went unbelievably slow and kind of crashed. I figured, that the slowness came from the php script refreshing so often (>10 times every second).

I took a look at my vserver, launched htop to view the processes. The CPU was barely used. It was between 0% and peaked up to 5%. RAM was only half used, at around 500 MB of 1GB (which is standard, even before the chat went online).

I resolved the issue by creating a cron job that creates only 1 html cache-site of the chat log and giving that one out to the users.

But still, I wonder why would the server go so slow, even though CPU and RAM weren't busy at all?

user1211030
  • 2,680
  • 2
  • 19
  • 22
  • I must also add, that accessing the Vserver via SSH was extremely slow. – user1211030 Sep 19 '12 at 03:02
  • Also, pinging the server gave lines like 23 ms 21 ms Timeout Timeout 15 ms Timeout 34ms Timeout 23 ms 19 ms – user1211030 Sep 19 '12 at 03:03
  • There could be a heap of reasons, but one possible is that you have a firewall or similar that is doing a reverse look-up on incoming IP addresses. You probably should post this on http://serverfault.com/ (same type of site, be aimed at server quetions) – Robbie Sep 19 '12 at 04:25

1 Answers1

1

If you aren't maxing out cpu then it's blocking on some other resource. Some possible candidates:

  • is there a db query for every request? If so, how long are they taking?

  • how many threads do you have available to serve requests? How long does each request take? If there is only one thread serving requests then an incoming load of 100 requests per second (which it sounds like you were doing) would start backing up once a request took longer than 10ms, at best.

As a general strategy, I'd add timers and try to bisect the problem. Take the time at the beginning and end of the request, if this is small then you know the delay is elsewhere.

Also, for this kind of workload you should consider pushing updates rather than polling. That way you can push a single message to update listening clients and they don't have to poll frequently to get new messages right away.

Adam Hupp
  • 1,513
  • 8
  • 7