0

Having a log file in the standard combined access_log format of nginx or apache, how would you, in UNIX shell, calculate the number of visits or page views (i.e. total requests) from each visitor (i.e. IP-address) that a given referrer once brought?

In other words, the number of ALL requests by each visitor that have found a link to your site on another site.

cnst
  • 25,870
  • 6
  • 90
  • 122

1 Answers1

0

The best snippet I could come up with is the following:

fgrep http://t.co/ /var/www/logs/access.log | cut -d " " -f 1 | \
fgrep -f /dev/fd/0 /var/www/logs/access.log | cut -d " " -f 1 | sort | uniq -c

What does this do?

We first find unique IP-addresses of visits that have http://t.co/ in the log entry. (Notice that this will only count visits that came directly from the ref, but not those that stayed and browsed the site further.)

After having a list of IP-addresses that, at one point, were referred from a given URL, we pipe such list to another fgrep through stdin/dev/fd/0 (a very inefficient alternative would have been xargs -n1 fgrep access.log -e instead of fgrep -f /dev/fd/0 access.log) for finding all hits from such addresses.

After the second fgrep, we get the same set of IP-addresses that we had in the first step, but now they repeat according to the total number of requests -- now sort, uniq -c, done. :)

cnst
  • 25,870
  • 6
  • 90
  • 122