How do I use awk to parse the Apache access log file to display information in the following format?
Date Time Count IP Address
2016-05-26 00:00 200 192.168.1.x
2016-05-26 00:00 152 172.17.100.x
2016-05-26 00:01 43 192.168.1.x
Let me be clear. I do not want to show total requests per hour. I do not want to show total requests per minute. I know how to write basic awk scripts to perform both of those tasks.
I want to see how many requests per minute each unique IP address is sending. I'm not savvy enough with awk to do this.
Apache Log Format
LogFormat "%h %l %u %{%F %T %z}t \"%r\" %>s %O \"%{Referer}i\" \"%{User-Agent}i\""
Sample
I tailed the end of the log file. Here's a small sample of what it contains. (We have over 100K entries for today. It's not feasible to share them all here. If more lines are needed please ask.)
54.213.236.39 - - 2016-05-26 14:38:51 -0400 "GET /p1077921.html HTTP/1.0" 403 400 "-" "Apache-HttpClient/4.5.2 (Java/1.8.0_77)"
54.213.236.39 - - 2016-05-26 14:38:51 -0400 "GET /p1060432.html HTTP/1.0" 403 398 "-" "Apache-HttpClient/4.5.2 (Java/1.8.0_77)"
54.213.254.166 - - 2016-05-26 14:38:51 -0400 "GET /p819757.html HTTP/1.0" 403 400 "-" "Apache-HttpClient/4.5.2 (Java/1.8.0_77)"
54.213.236.39 - - 2016-05-26 14:38:51 -0400 "GET /p1084269.html HTTP/1.0" 403 400 "-" "Apache-HttpClient/4.5.2 (Java/1.8.0_77)"
107.23.252.229 - - 2016-05-26 14:38:51 -0400 "GET /p305987.html HTTP/1.0" 403 399 "-" "Apache-HttpClient/4.5.2 (Java/1.8.0_77)"
Example 1:
grep '2016-05-26' access.log | awk '{print $1}' | sort | uniq -c | sort -n | tail -40 | awk '{print $2,$2,$1}' | logresolve | awk '{printf "%6d %s (%s)\n",$3,$1,$2}'
Produces the following output
307 135-23-174-138.cpe.pppoe.ca (135.23.174.138)
313 5265DCE5.cm-8.dynamic.ziggo.nl (82.101.220.229)
378 92-108-204-76.dynamic.upc.nl (92.108.204.76)
405 0191301456.0.fullrate.ninja (90.185.180.167)
632 ec2-52-58-151-132.eu-central-1.compute.amazonaws.com (52.58.151.132)
798 187.228.212.148 (187.228.212.148)
877 207.246.75.253 (207.246.75.253)
966 ec2-54-213-177-120.us-west-2.compute.amazonaws.com (54.213.177.120)
1116 ec2-54-186-148-0.us-west-2.compute.amazonaws.com (54.186.148.0)
1224 ppp121-44-247-209.bras2.syd2.internode.on.net (121.44.247.209)
1369 ec2-54-187-239-46.us-west-2.compute.amazonaws.com (54.187.239.46)
1584 45.55.189.64 (45.55.189.64)
2658 50-77-47-70-static.hfc.comcastbusiness.net (50.77.47.70)
Example 2:
grep "2016-05-26" access.log | awk '{ print $4, $5, $1}' | cut -f2 | awk -F: '{ print $1":"$2 }' | sort -nk1 -nk2 | uniq -c | awk '{ if ($1 > 10) print $0 }'
That gives the following output:
560 2016-05-26 00:00
534 2016-05-26 00:01
538 2016-05-26 00:02
554 2016-05-26 00:03
566 2016-05-26 00:04
534 2016-05-26 00:05
559 2016-05-26 00:06
531 2016-05-26 00:07
540 2016-05-26 00:08
435 2016-05-26 00:09
312 2016-05-26 00:10
All help is much appreciated.