2

Lately our Apache web server has been giving us this error multiple times per day:

[Tue Apr 06 01:07:10 2010] [error] Server ran out of threads to serve requests. Consider raising the ThreadsPerChild setting

We raised our ThreadsPerChild setting from 50 to 100, but we still get the error. Our access logs indicate that these errors never even happen at periods of high load. For example, here's an excerpt from our access log (ip addresses and some urls are edited for privacy). As you can see, the above error happened at 1:07 and only a small handful of requests occurred in the several minutes leading up to the error:

99.88.77.66 - - [06/Apr/2010:00:59:33 -0400] "GET /WebRepository/jquery/jquery-ui-1.7.1.custom/css/smoothness/images/ui-icons_222222_256x240.png HTTP/1.1" 304 -
99.88.77.66 - - [06/Apr/2010:00:59:34 -0400] "GET /WebRepository/jquery/jquery-ui-1.7.1.custom/css/smoothness/images/ui-bg_glass_75_dadada_1x400.png HTTP/1.1" 200 111
99.88.77.66 - - [06/Apr/2010:00:59:34 -0400] "GET /WebRepository/jquery/jquery-ui-1.7.1.custom/css/smoothness/images/ui-bg_glass_75_dadada_1x400.png HTTP/1.1" 200 111
99.88.77.66 - mpeu [06/Apr/2010:00:59:40 -0400] "GET /some/dynamic/content HTTP/1.1" 200 145049
55.44.33.22 - mpeu [06/Apr/2010:01:06:56 -0400] "GET /other/dynamic/content HTTP/1.1" 200 12311
55.44.33.22 - - [06/Apr/2010:01:06:56 -0400] "GET /WebRepository/jquery/jquery-ui-1.7.1.custom/css/smoothness/jquery-ui-1.7.1.custom.css HTTP/1.1" 304 -
55.44.33.22 - - [06/Apr/2010:01:06:56 -0400] "GET /WebRepository/jquery/jquery-ui-1.7.1.custom/js/jquery-1.3.2.min.js HTTP/1.1" 304 -
55.44.33.22 - - [06/Apr/2010:01:06:56 -0400] "GET /WebRepository/jquery/jquery-ui-1.7.1.custom/js/jquery-ui-1.7.1.custom.min.js HTTP/1.1" 304 -
55.44.33.22 - - [06/Apr/2010:01:06:56 -0400] "GET /WebRepository/jquery.tablesorter.min.js HTTP/1.1" 304 -
55.44.33.22 - - [06/Apr/2010:01:06:56 -0400] "GET /WebRepository/date.js HTTP/1.1" 304 -
55.44.33.22 - - [06/Apr/2010:01:06:56 -0400] "GET /WebRepository/pdfs/image1.gif HTTP/1.1" 304 -
55.44.33.22 - - [06/Apr/2010:01:06:56 -0400] "GET /WebRepository/pdfs/image2.png HTTP/1.1" 304 -
55.44.33.22 - - [06/Apr/2010:01:06:56 -0400] "GET /WebRepository/pdfs/image3.png HTTP/1.1" 304 -
55.44.33.22 - - [06/Apr/2010:01:06:56 -0400] "GET /WebRepository/pdfs/image4.png HTTP/1.1" 304 -
55.44.33.22 - - [06/Apr/2010:01:06:56 -0400] "GET /WebRepository/pdfs/image5.png HTTP/1.1" 304 -
55.44.33.22 - - [06/Apr/2010:01:06:56 -0400] "GET /WebRepository/pdfs/image6.png HTTP/1.1" 304 -
55.44.33.22 - - [06/Apr/2010:01:06:56 -0400] "GET /WebRepository/pdfs/image7.png HTTP/1.1" 304 -
55.44.33.22 - - [06/Apr/2010:01:06:57 -0400] "GET /WebRepository/pdfs/image8.png HTTP/1.1" 304 -
55.44.33.22 - - [06/Apr/2010:01:06:57 -0400] "GET /WebRepository/pdfs/image9.png HTTP/1.1" 304 -
55.44.33.22 - - [06/Apr/2010:01:06:57 -0400] "GET /WebRepository/pdfs/imageA.png HTTP/1.1" 304 -
55.44.33.22 - - [06/Apr/2010:01:06:57 -0400] "GET /WebRepository/jquery/jquery-ui-1.7.1.custom/css/smoothness/images/ui-bg_flat_75_ffffff_40x100.png HTTP/1.1" 304 -
55.44.33.22 - - [06/Apr/2010:01:06:59 -0400] "GET /WebRepository/jquery/jquery-ui-1.7.1.custom/css/smoothness/images/ui-bg_highlight-soft_75_cccccc_1x100.png HTTP/1.1" 304 -
55.44.33.22 - - [06/Apr/2010:01:06:59 -0400] "GET /WebRepository/jquery/jquery-ui-1.7.1.custom/css/smoothness/images/ui-bg_glass_75_e6e6e6_1x400.png HTTP/1.1" 200 110
55.44.33.22 - - [06/Apr/2010:01:06:59 -0400] "GET /WebRepository/jquery/jquery-ui-1.7.1.custom/css/smoothness/images/ui-bg_glass_75_e6e6e6_1x400.png HTTP/1.1" 200 110
11.22.33.44 - mpeu [06/Apr/2010:01:18:03 -0400] "GET /other/dynamic/content HTTP/1.1" 200 12311
11.22.33.44 - - [06/Apr/2010:01:18:03 -0400] "GET /WebRepository/jquery/jquery-ui-1.7.1.custom/js/jquery-1.3.2.min.js HTTP/1.1" 304 -
11.22.33.44 - - [06/Apr/2010:01:18:04 -0400] "GET /WebRepository/jquery/jquery-ui-1.7.1.custom/css/smoothness/jquery-ui-1.7.1.custom.css HTTP/1.1" 200 27374
11.22.33.44 - - [06/Apr/2010:01:18:04 -0400] "GET /WebRepository/jquery/jquery-ui-1.7.1.custom/js/jquery-ui-1.7.1.custom.min.js HTTP/1.1" 304 -
11.22.33.44 - - [06/Apr/2010:01:18:04 -0400] "GET /WebRepository/jquery.tablesorter.min.js HTTP/1.1" 200 12795
11.22.33.44 - - [06/Apr/2010:01:18:04 -0400] "GET /WebRepository/date.js HTTP/1.1" 200 25809

For what it's worth, we're running the version of Apache that ships with Oracle 10g (some 2.0 version), and we're using mod_plsql to generate our dynamic content. Since the Apache server runs as a separate process and the database doesn't record any problems when this error occurs, I'm doubtful that Oracle is the problem.

Unfortunately, the errors are freaking out our sysadmins, who are inclined to blame any and all problems which occur with the server on this error. Is this a known bug in Apache that I simply haven't been able to find any reference to through Google?

EDIT: At Embreau's request, here are the settings we're using (note that the Unix-specific ones such as MinSpareServers are commented out) [ANOTHER EDIT - except for ThreadsPerChild these are all just the default values that existed at installation]:

ServerType standalone
Timeout 300
SendBufferSize 16384
KeepAlive On
MaxKeepAliveRequests 100
KeepAliveTimeout 15
MaxRequestsPerChild 0
ThreadsPerChild 100
#MinSpareServers 5
#MaxSpareServers 20
#MaxClients 150

FURTHER EDIT: This is a Windows Server 2003 system running on a 64-bit 1.6 GHz Itanium 2 server with 16 gigs of RAM. We've started doing some logging to determine how much load the server is under while these errors occur; our Apache logs show that almost no one is hitting the website, but there are data collection processes happening in the background, so perhaps one of them has slowed down Apache enough to cause some problems or something.

Eli Courtwright
  • 449
  • 1
  • 5
  • 14
  • 1
    Please add your full worker settings, this may help a lot to give an accurate answer. – Embreau Apr 06 '10 at 22:04
  • @Embreau: I've edited my post to add all settings which seemed even a little relevant. – Eli Courtwright Apr 07 '10 at 14:26
  • your running apache under windows? I think you'll find apache runs a lot better under unix environments. Try install linux and see how things go, or use iis from microsoft. – The Unix Janitor Apr 11 '10 at 23:15
  • @user37899: Oh how I wish that Linux were an option. However, we're developing for a client and they've chosen the technology stack, so we have no ability to do this. – Eli Courtwright Apr 12 '10 at 13:43
  • @Eli , oh well , i guess some people never learn. windows is for running fun games on the desktop for teenagers, not for highly scalable internet servers. – The Unix Janitor Apr 12 '10 at 15:07
  • 3
    @user37899 Isn't it ironic that you're posting to one of the Trilogy sites (Stackoverflow, Serverfault and Superuser) serving millions of pages daily all run on Windows Servers? – splattne Apr 13 '10 at 12:55
  • 1
    @splattne I'm not talking about windows, i'm talking about apache, apache was designed and developed on unix type systems. Sure you can run it on windows, but why bother why you have iss? Severfault should really be divided into two sites, one for unix type systems, and one for windows. In fact there is so many free software hating trolls on this site, it makes me upset. – The Unix Janitor Apr 14 '10 at 14:54
  • 2
    @user37899 sorry, I was referring to "windows is for running fun games on the desktop for teenagers, not for highly scalable internet servers." - btw: you can flag posts or comments of trolls. We will review them. – splattne Apr 14 '10 at 21:04
  • @TheUnixJanitor do you have any proven facts or references to back up this claim? Other than your own personal experiences that could be heavily biased? In 2016, both Windows and Linux/Unix are great server operating systems. EDIT: I just realized this was posted 6 years ago. LOL. – cbmeeks Aug 09 '16 at 14:30

2 Answers2

2

Your Timeout value is set to 300 seconds, which is 5 minutes, set it to a more reasonable value like 15 or 30 seconds.

Now your problem might be with the ThreadsPerChild value. Set it to at least 250. Please monitor the change in task manager under load to be sure it is not overkill (it probably isn't, I've set it higher on some old single core CPU serving busy sites.)

If I understand correctly, it is a Windows OS? If so which one and on what kind of Hardware? (cpu & memory)

Embreau
  • 1,287
  • 1
  • 9
  • 10
  • I've edited my post to include the system info. As for raising the ThreadsPerChild count, since our logs show that we only have 2-3 people hitting our website every few minutes when these errors occur, I really feel like that's barking up the wrong tree. We did it once already, raising from 50 to 100, and the problem didn't get any better. We'll think about lowering the timeout, though I'm wary about making it too low since there are a couple of really slow pages in this site (none of which are accessed around the times when these errors occur). – Eli Courtwright Apr 08 '10 at 14:04
2

While your configuration settings have room for improvement, such as what Embreau mentioned, they may not be the direct cause.

It's potentially your application or something along the stack causing the issue.

For example, if your application was waiting for a response from a database it could eventually cause all threads to be waiting thus causing issues even on low load. This performance would often be exampled by active database connections churning.

The same performance could be exhibited by an application bug and would be more difficult to isolate. While this is true, unless there's hints as to this being the cause, I would focus on the two things below first.

Is there a particular reason why you have ThreadsPerChild or SendBufferSize configured at all? With ThreadsPerChild, unless there is an unusual need or you have given the proper thought to its use, the default should be fine. If it is not tuned properly, it could exhaust physical memory and begin swapping, which would reduce performance.

MaxRequestsPerChild set to 0 is unwise. If your application has memory leaks, the Apache children will never recycle. You want them to recycle.

I'm guessing you are a developer. Your system administrators should be working closely with you to resolve this issue, as it is definitely a cross-functional issue.

Warner
  • 23,756
  • 2
  • 59
  • 69
  • Every parameter listed is set to the defaults from when we installed Apache (technically when we installed Oracle, which came bundled with Apache), except for `ThreadsPerChild` which was only increased after we started getting this error; I've edited my question to clarify this. I suppose we can try editing the MaxRequestsPerChild in case that reduces or eliminates the error. As you guessed, I'm a developer and not a susadmin, but the sysadmins who work for the client are... less helpful than we'd like. Thanks for the suggestions; let me know if you think of anything else. – Eli Courtwright Apr 12 '10 at 17:53
  • It seems unusual to me that `MaxRequestsPerChild` would default to 0. I'll give it some more thought-- let me know how the troubleshooting goes. – Warner Apr 12 '10 at 21:21
  • Well I got pulled onto another project before I could spend much time diagnosing this problem, but thanks for the suggestions; I'm marking your answer as accepted because I appreciate the thoughtful advice. – Eli Courtwright Apr 15 '10 at 21:22