Your pipeline is actually pretty good. You really just need it to scale large numbers. I replaced your tail -1000 access_log | awk '{ print $1 }' |
with an unsorted file of ip numbers from one of my web servers. Added head -20
to just print the 20 most active ip addresses.
$ sort ip.txt | uniq -c | sort -nr | \
> awk 'NR==1{scale=$1/50} \
> {printf("\n%-23s ",$0) ; \
> for (i = 0; i<($1/scale) ; i++) {
> printf("*")}; \
> }' | head -20
The important parts are
NR==1{scale=$1/50}
to calculate the
scaling factor to fit the maximum
count into 50 characters, and
printf("\n%-23s ",$0) ;
uses a
width specifier %-23s
to left-align
the count and ip address within a 23
character space.
My output looks like this. I masked the IP addresses.
824 xx.xxx.xx.39 **************************************************
149 xx.xxx.xxx.176 **********
138 xx.xxx.xxx.191 *********
137 xx.xxx.xxx.41 *********
105 xx.xxx.xxx.8 *******
97 xx.xxx.xxx.21 ******
96 xx.xxx.xx.220 ******
91 xx.xx.xxx.198 ******
87 xx.xxx.xxx.195 ******
85 xx.xxx.xx.221 ******
79 xxx.xxx.xxx.86 *****
69 xx.xx.xx.12 *****
68 xxx.xxx.xxx.159 *****
65 xx.xxx.xxx.66 ****
63 xx.xxx.xx.28 ****
60 xx.xxx.xxx.104 ****
59 xxx.xxx.xxx.242 ****
59 xxx.xx.xxx.66 ****
56 xx.xxx.xxx.202 ****
This kind of output has a human-factors problem. People judge graphs like these by the area of the lines (the asterisks). Since this display scales with the magnitude of the numbers, you can't visually compare two of these graphs with any reliability.
Your eyes and brain want you to judge the length of the lines. (I'm not sure where I learned this. Maybe from Tufte's books, or from studying statistics.) But the scaling might mean that the longest line on one graph represents 800, while an identical line on another graph might represent only 100. Your eyes and brain want to believe those two are roughly equal, even though one is eight times as big as the other, and even though you can see the raw numbers.