0

I need to perform a whois lookup on a file containing IP addresses and output both the country code and the IP address into a new file. In my command so far I find the IP addresses and get a unique copy that doesn't match allowed ranges. Then I run a whois lookup to find out who the foreign addresses are. Finally it pulls the country code out. This works great, but I can't get it show me the IP alongside the country code since that isn't included in the whois output.

What would be the best way to include the IP address in the output?

awk '{match($0,/[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+/); ip = substr($0,RSTART,RLENGTH); print ip}' myInputFile \
  | sort \
  | uniq \
  | grep -v '66.33\|66.128\|75.102\|216.106\|66.6' \
  | awk -F: '{ print "whois " $1 }' \
  | bash \
  | grep 'country:' \
  >> myOutputFile

I had thought about using tee, but am having troubles lining up the data in a way that makes sense. The output file should be have both the IP Address and the country code. It doesn't matter if they are a single or double column.

Here is some sample input:

Dec 27 04:03:30 smtpfive sendmail[14851]: tBRA3HAx014842: to=, delay=00:00:12, xdelay=00:00:01, mailer=esmtp, pri=1681345, relay=redcondor.itctel.c om. [75.102.160.236], dsn=4.3.0, stat=Deferred: 451 Recipient limit exceeded for this se nder Dec 27 04:03:30 smtpfive sendmail[14851]: tBRA3HAx014842: to=, delay=00:00:12, xdelay=00:00:01, mailer=esmtp, pri=1681345, relay=redcondor.itctel.c om. [75.102.160.236], dsn=4.3.0, stat=Deferred: 451 Recipient limit exceeded for this se nder

Thanks.

Charles Duffy
  • 280,126
  • 43
  • 390
  • 441
user3788019
  • 15
  • 1
  • 6
  • 3
    `awk | sort | uniq | grep | awk | bash | grep` sounds a bit excessive. Maybe you can provide a sample `myInputFile` together with desired output so we may come up with a better approach. – fedorqui Dec 29 '15 at 16:21
  • FYI, it's more efficient to put `>whatever` after the `done`, rather than reopening the file every time you want to run a `whois` command. – Charles Duffy Dec 29 '15 at 16:21
  • Also, I wholeheartedly agree with @fedorqui -- I can't think of a circumstance under which your pipeline couldn't be brought down to exactly two elements and no more. (Keep in mind that `awk` can do sorting, and uniq'ing, and grepping -- inverted or otherwise) – Charles Duffy Dec 29 '15 at 16:22
  • The input file is a sendmail mail log. I can't seem to attach a file anyplace, but included a couple of lines below. – user3788019 Dec 29 '15 at 16:24
  • Also, generating commands for bash with `awk` is, as a whole, prone to security bugs -- since `awk` doesn't have any facilities equivalent to bash's `printf %q` to safely shell-quote content. This particular case might be safe, as the user-provided content is restricted to match IP addresses, but it's a bad habit to be in. – Charles Duffy Dec 29 '15 at 16:24
  • Also, which version of bash do you have? – Charles Duffy Dec 29 '15 at 16:27
  • bash version 4.1.2(1) – user3788019 Dec 29 '15 at 16:32
  • So... you asked a question and 10 minutes later accepted the first answer you got. You're not even a TINY but curious if there are any other suggestions for alternative ways to solve the problem so you can consider the pros/cons of various approaches? Oh well, good luck! – Ed Morton Dec 29 '15 at 17:03
  • @EdMorton, I'd surely upvote a good answer if you saw fit to provide one. – Charles Duffy Dec 29 '15 at 22:55
  • @CharlesDuffy I generally avoid even reading questions that already have an answer selected since the OP has presumably got what they wanted and moved on so providing a possible solution would be a waste of time. Plenty of unanswered questions out there to squander time reading :-). – Ed Morton Dec 30 '15 at 15:43
  • @EdMorton, sure, but the audience to a question is more than just the OP -- it's anyone with a similar problem, assuming it's well-asked enough to be general. And if it's not, that's cause for editing or closing the question. :) – Charles Duffy Dec 30 '15 at 19:14
  • Yeah but if a question is unanswered you can be pretty sure you're effort is useful to one person at least while if it's already answered the ROI diminishes :-). – Ed Morton Dec 31 '15 at 00:26

1 Answers1

2

In general: Iterate over your inputs as shell variables; this then lets you print them alongside each output from the shell.


The below will work with bash 4.0 or newer (requires associative arrays):

#!/bin/bash
#      ^^^^- must not be /bin/sh, since this uses bash-only features

# read things that look vaguely like IP addresses into associative array keys
declare -A addrs=( )
while IFS= read -r ip; do
  case $ip in 66.33.*|66.128.*|75.102.*|216.106.*|66.6.*) continue;; esac
  addrs[$ip]=1
done < <(grep -E -o '[0-9]+[.][0-9]+[.][0-9]+[.][0-9]+')

# getting country code from whois for each, printing after the ip itself
for ip in "${!addrs[@]}"; do
  country_line=$(whois "$ip" | grep -i 'country:')
  printf '%s\n' "$ip $country_line"
done

An alternate version which will work with older (3.x) releases of bash, using sort -u to generate unique values rather than doing that internal to the shell:

while read -r ip; do
  case $ip in 66.33.*|66.128.*|75.102.*|216.106.*|66.6.*) continue;; esac
  printf '%s\n' "$ip $(whois "$ip" | grep -i 'country:')"
done < <(grep -E -o '[0-9]+[.][0-9]+[.][0-9]+[.][0-9]+' | sort -u)

It's more efficient to perform input and output redirection for the script as a whole than to put a >> redirection after the printf itself (which would open the file before each print operation and close it again after, incurring a substantial performance penalty), which is why suggested invocation for this script looks something like:

countries_for_addresses </path/to/logfile >/path/to/output
Charles Duffy
  • 280,126
  • 43
  • 390
  • 441
  • That worked well. Now I just need to add bit for the grep -v '66.33\|66.128\|75.102\|216.106\|66.6' and I will be done. – user3788019 Dec 29 '15 at 16:44
  • Could also implement that logic in bash. `case $ip in 66.33.*|66.128.*|75.102.*|216.106.*|66.6.*) continue;; esac` – Charles Duffy Dec 29 '15 at 16:45
  • @user3788019, ...my suggestion above is actually a touch less buggy, since it matches those strings only in prefix position, rather than also excluding `1.2.66.6` or `1.66.6.2`, though you could also rewrite your grep: `grep -E -v '^(66[.]33|66[.]128|75[.]102|216[.]106|66[.]6)'`. – Charles Duffy Dec 29 '15 at 16:46
  • 1
    @user3788019, ...btw, if you're wondering why I prefer `[.]` to `\.`, try seeing what happens when the latter is placed inside backticks. – Charles Duffy Dec 29 '15 at 16:51