Collecting request duration logs from IHS
The Apache module mod_log_config allows certain properties to be logged. Interesting is %T
:
The time taken to serve the request, in seconds.
Using the %{UNIT}T
format, it could be displayed in microseconds instead of seconds, which is better for analyzing later. So I extended the common log format to write the processing time first in conf/httpd.conf
:
LogFormat "%{ms}T %h %l %u %t \"%r\" %>s %b" common
Another good idea is to add the User-Agent too, which gave some information about the type of client:
LogFormat "%{ms}T %h %l %u %t \"%r\" %>s %b \"%{User-Agent}i\"" common
First check the configuration file to make sure we didn't break anything:
/opt/IBM/HTTPServer #./bin/apachectl -t
Syntax OK
Now we can do a graceful reastart on productive environments. This way, the webserver finishes all currently processed requests before applying our new config. In opposite to a hard restart, most of the users shouldn't even know that we restarted their webserver.
./bin/apachectl -k graceful
All new requests are logged with the duration in ms like this:
100007 1.2.3.4 - - [03/Nov/2020:14:29:52 +0100] "POST /push/form/comet/connect HTTP/1.1" 200 96 "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:82.0) Gecko/20100101 Firefox/82.0"
Analyzing the measured data
We now have several hundred MB of logs from a single day. A filter is required to get usefull information from them. When looking in the logs, we need to do two things to get information about slow requests from them:
- Filter on request duration: Requests which are faster than XXX ms need to be sorted out
- HCL Connections and Kudos Board use some very old workarounds: They keep ajax requests open to get informed about server updates. A old way of websockets. Those requests just spam our slowlog with false-positives.
I wrote the following command for this purpose:
log=/tmp/access-slow.log; egrep -i -v "(POST /(push|files)/|GET /(kudosboards/updates|api-boards)/|\.mp4(\?[A-Za-z0-9_\-=&]+)? HTTP)" logs/access_2020_11_03.log | egrep '^[0-9]{4,}' | sort -rnk1 > $log; wc -l $log
The first egrep
call filters out all requests which are not interesting here: /push
is from the long keep alive connections, kudos do this as well. I also excluded mp4 movies because those files are naturally large. When downloading a 500MB movie take some time, it's nothing we need to worry about. Sometimes I also saw them with GET parameters like movie.mp4?logDownload=true&downloadType=view
so those were filtered as well.
egrep '^[0-9]{4,}'
sort out requests which have a processing time less than 4 digits, wich basically mean >= 1000ms. Since I'm looking for heavy performance issues, this seems a good point to start.
sort -rnk
orders our logs by duration descending, so we have the slowest entries at the beginning.
wc -l $log
is just to know how much slow entries we collected this time. Now they could be viewed with less $log
. Our large access log with over 2 million entries was reduced to 2k.