1

I have my Apache logs set up like this:

LogFormat "%v %t %I %O" billing

How can I use AWK to generate a report which shows me the total bandwidth (received + sent) in MB per virtual host?

Here's an example log output:

bob.com  [3 JULY 2013]  903 299
bob.com  [8 JULY 2013]  192 138
luke.com [12 JULY 2013]  34 123
bob.com  [19 JULY 2013] 616 213
luke.com [22 JULY 2013]  23  74

I'm looking for an output that sums up the 3rd and 4th columns for bob.com and luke.com without actually specifying the domains, as I have 50+ domains and wouldn't want to maintain a list. Much easier just to have the print out consolidated.

James
  • 13
  • 3

2 Answers2

1

Or this:

awk '{T[$1]+=$NF+$(NF-1)} END{for(i in T) print i,T[i]}' file

would produce

bob.com 2361
luke.com 254

With your sample log file..

Scrutinizer
  • 126
  • 1
  • Awesome, thank you! So the file is logged is in Bytes, how can I convert this formula to present the data in MB? – James Jun 08 '13 at 14:48
  • I think I got it figure out: I added '/1024/1024' before the last '}' Thanks again! – James Jun 08 '13 at 14:55
0

You can write a small script to do this job:

#!/bin/bash

log_file="/path/to/logfile"
domains=`awk '{print $1}' $log_file | sort | uniq`

for domain in $domains
do
    sum=$(grep "$domain" $log_file | \
            awk '{ for (i = 5; i <= NF; i++) s = s+$i }; END { print s+0 }')

    echo "Total bandwidth of $domain is $sum"

done
cuonglm
  • 2,386
  • 2
  • 16
  • 20
  • This processes a potentially large file for *each* virtual host (i.e. many times) and stores potentially large amounts of data in a variable. Also, it uses *both* the archaic backticks and the preferred `$()` form of command substitution. The `echo` in the `for` statement is unnecessary. So is the `for` statement in the `awk` line (it's just two fields - just add them). The `s+0` isn't necessary. There's no need in this context to coerce an integer. A pipe acts as a line continuation character. No need for the backslash. – Dennis Williamson Jun 08 '13 at 11:27
  • The `for` in in awk is unnecessary? I don't think so.`grep "$domain" $log_file` ouput full fields, not two fields. The `s+0` is for reason, when all bandwith in a domain is zero. The backslash is used to make my code more clearly. Accept all others advices. – cuonglm Jun 08 '13 at 12:06
  • The OP's example shows only two bandwidth fields. If the bandwidth fields are zero, then a zero will be printed. Only empty fields will need to be coerced. One thing I forgot to mention in my previous comment is that it's unnecessary to pipe `grep` into `awk` since AWK can do `grep's` job. – Dennis Williamson Jun 08 '13 at 15:01