Extract IP address from a full line after slurping file and using regex to match IP address

Question

I have written the following code to read a file, slurp, identify IP addresses and track the number of occurrences of each address using a hash structure. The problem is that instead of my key being the IP address matched from regex, the key is the entire line on which the IP address appears. How do I fix this? (I believe the issue has to do with the fact that slurping is done line by line)

%ipcount;

@fileslurp = <FH>;
foreach(@fileslurp){
    if($_ =~ m/(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})/){
        $ipcount{$_}++;
    }
}

$numIP = scalar keys %ipcount;

print "Number of unique IP: $numIP \n"; 

foreach $ipaddress (sort { $ipcount{b} <=> $ipcount{a} } keys %ipcount){
    print "$ipaddress: $ipcount{$ipaddress} \n";
}

score 1 · Accepted Answer · edited Jun 17 '14 at 22:40

Looks like you are already doing a group match, just change $_ to $1 when adding to the hash.

%ipcount;

@fileslurp = <FH>;
foreach(@fileslurp){
    if($_ =~ m/(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})/){
        $ipcount{$1}++;
    }
}

$numIP = scalar keys %ipcount;

print "Number of unique IP: $numIP \n"; 

foreach $ipaddress (sort keys %ipcount){
    print "$ipaddress: $ipcount{$ipaddress} \n";
}

Get in the habit of using use strict; and use warnings; in EVERY perl script. It will help you catch problems.

score 0 · Answer 2 · answered Jun 17 '14 at 05:16

Notice $ipcount{$_}, here you are using $_ which is your line, change this to $ipcount{$1} where $1 will be captured IP address.

One more thing is that your regex for matching IP address is not correct. Matching an IP address is another good example of a trade-off between regex complexity and exactness. \b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b will match any IP address just fine, but will also match 999.999.999.999 as if it were a valid IP address. Whether this is a problem depends on the files or data you intend to apply the regex to. To restrict all 4 numbers in the IP address to 0..255, you can use the following regex. It stores each of the 4 numbers of the IP address into a capturing group. You can use these groups to further process the IP number. Free-spacing mode allows this to fit the width of the page.

\b(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\. (25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\. (25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\. (25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b

If you don't need access to the individual numbers, you can shorten the regex with a quantifier to:

\b(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3} (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b

Similarly, you can shorten the quick regex to \b(?:\d{1,3}\.){3}\d{1,3}\b

Also the Regexp::Common::net portion of Regexp::Common may have regex that you desire.

Extract IP address from a full line after slurping file and using regex to match IP address

2 Answers2