0

I am doing a nslookup on URLs for multiple iterations using shell script. I need to check how many times IP was returned for each URL.

In output file, output is stored as

URL 
IP address

using uniq -c command I get the count when same IP addresses are adjacent but not when same IP addresses are on non-adjacent line

Command is 
cat file.log | awk '{print $1}' | uniq -c

here is the sample output

1 url
3 72.51.46.230

Now if multiple IP addresses are returned for a particular URL and they are on non-adjacent lines because I have run no. of iterations. In that case uniq-c command will not work. If I use sort option it sorts but I need to display the output as above for each URL ie. URL and next line with the count and its IP address.

For eg. if I do nslookup on google.com it will return multiple addresses and I do uniq -c I get following output. As you see there are same IP addresses but count is only 1 as uniq -c does not work on non-adjacent lines.

  1 74.125.236.64
  1 74.125.236.78
  1 74.125.236.67
  1 74.125.236.72
  1 74.125.236.65
  1 74.125.236.73
  1 74.125.236.70
  1 74.125.236.66
  1 74.125.236.68
  1 74.125.236.71
  1 74.125.236.69
  1 nslookup: can't resolv 'google.com'
  1 nslookup: can't resolv 'google.com'
  1 nslookup: can't resolv 'google.com'
  1 nslookup: can't resolv 'google.com'
  1 nslookup: can't resolv 'google.com'
  1 nslookup: can't resolv 'google.com'
  1 nslookup: can't resolv 'google.com'
  1 74.125.236.70
  1 74.125.236.66
  1 74.125.236.68
  1 74.125.236.71
  1 74.125.236.69

I tried with AWK as well but in that case output is not formatted as I require.

Awk command

awk '{a[$0]++}END{for (i in a) printf "%-2d -> %s \n", a[i], i}' file.log

Can you suggest a better solution to achieve this - Get count and Display in the format as mentioned above?

Output format desired is

URL
Count IP address

sample input file.

URL1
72.51.46.230
72.51.46.230
google.com
74.125.236.64
74.125.236.78
(null)
nslookup: can't resolv 'google.com'
nslookup: can't resolv 'google.com'
nslookup: can't resolv 'google.com'
nslookup: can't resolv 'google.com'
nslookup: can't resolv 'google.com'

Sample Output required as

URL1
2 72.51.46.230
google.com
1 74.125.236.64
1 74.125.236.78
1 null
5 nslookup: can't resolv 'google.com'

Thank you.

user2074894
  • 47
  • 3
  • 10

3 Answers3

2

The following awk script does the job:

$1~/[a-z]+[.].*/{         # If line have a letter in must be a URL 
    for(i in ip)          # Print all the counts and IPs (empty first time)
         print ip[i],i      
    delete ip             # Delete array for next set of IP's
    print                 # Print the URL 
    next                  # Skip to next line
}
{
    ip[$0]++              # If here line contains IP, increment the count per IP 
}
END{                      # Reached end of file need to print the last set of IPs
    for(i in ip)
        print ip[i],i
}

Save it as script.awk and run like:

$ awk -f script.awk file
creativecommons.org
2 72.51.46.230
google.com
5 nslookup: can't resolv 'google.com'
1 (null)
1 74.125.236.64
1 74.125.236.78
Chris Seymour
  • 83,387
  • 30
  • 160
  • 202
  • thank you. this worked perfect. Can you let me know what next and [a-z] does? – user2074894 Apr 12 '13 at 08:59
  • @sudo-o can you help me with taking errors also into account in output? As show above in place of null if an error is returned say 'nslookup: can't resolv url how do I take that also and print error count below the url – user2074894 Apr 17 '13 at 02:57
  • @sudo-o I have modified the output file above where you can see errors. – user2074894 Apr 17 '13 at 03:50
  • I used following command `awk '/[a-z]/{for(i in a) print a[i],i; delete a;print; next}{a [$0]++}END{for(i in a) print a[i],i}' file.log` but in this case IP count is fine only count for errors is not shown. It just prints errors same as it is – user2074894 Apr 17 '13 at 04:42
  • If you have a new question then post a new question, make sure to explain you problem clearly, give an example input and the expected example output. – Chris Seymour Apr 17 '13 at 08:45
  • @sudo-o I have provided sample input and output file above. Question is same about getting count but now requiring to include errors. – user2074894 Apr 17 '13 at 09:05
  • @sudo-o I modified the question as well to include that errors are also needed to take into account. – user2074894 Apr 17 '13 at 09:57
  • @user2074894 okay, okay, I have updated the scripts, modification is simple, just add block to deal with the errors. – Chris Seymour Apr 17 '13 at 10:30
  • @sudo-o Thank you. I will give this a try. One more thing how do i print stderr nslookup: can't resolv in same file as where IPs are recorded. I tried 2> at end of loop in main while loop but that overwrites other outputs. – user2074894 Apr 17 '13 at 10:35
  • @sudo-o The script which you gave works but it prints errors and its count at the end of file. It does not print error after the URL for which error was received. – user2074894 Apr 17 '13 at 11:31
  • @sudo-o I meant from above that error and its count is not printed after the URL for which error was received. – user2074894 Apr 17 '13 at 11:43
  • Post a clear question **from the start**. Do **not** change the requirements of question multiple times. Trying understanding the code I wrote, It's clearly commented and the modification are simple to keep the error under the following URL. I will **not** be making any changes to this answers now. – Chris Seymour Apr 17 '13 at 11:54
0

Try your first command but add sort:

awk '{print $1}' file.log | sort | uniq -c
Chris Seymour
  • 83,387
  • 30
  • 160
  • 202
V H
  • 8,382
  • 2
  • 28
  • 48
  • @Kent, different URL will not point to same IP-address. As I am running same no. of iterations on each URL, i get same IP for that URL. – user2074894 Apr 12 '13 at 07:30
  • I have tried sort it does work but it also sorts the URL names so I cannot display the output in format as I desire. In above if you see for URL creativecommons.org, I ran 3 iterations and I get the count. – user2074894 Apr 12 '13 at 07:31
0

you can directly use:

awk '{a[$1]++}END{for(i in a)print a[i],i}' file.log

instead of multiple commands and piping each command's output.

If you want it without awk:

cut -f1 -d"\t" file.log|sort|uniq-c will do

Vijay
  • 65,327
  • 90
  • 227
  • 319