1

Lets say I have this kind of file with internet flow information in it (file contains unlimited amount of strings):

startTime                     sourceIP    destinationIP 
2015-03-31 08:47:27.671      10.0.26.48     10.0.26.255 
2015-03-31 08:47:28.108     10.50.26.180     10.90.26.255 
2015-03-31 08:47:35.015      10.0.26.74 255.255.255.255 
                         ...
2015-03-31 16:18:25.365      196.0.26.13     224.0.0.252 
2015-03-31 16:18:32.718      10.46.26.13     224.0.0.252 
2015-03-31 16:18:46.941      188.0.26.98     177.0.26.255 
2015-03-31 16:18:58.336      10.0.26.57     10.0.26.255
2015-03-31 15:53:37.451      50.0.26.13     224.0.0.252 
2015-03-31 15:53:55.086      10.0.26.13     40.30.0.252 
2015-03-31 15:53:55.097      128.0.26.13     224.0.0.252
                         ...
2015-04-01 22:38:43.500   192.168.0.109   78.57.218.154 
2015-04-01 22:38:43.500  213.159.38.184   192.168.0.109 
2015-04-01 22:38:46.359   178.250.32.43   192.168.0.109
2015-04-01 22:38:53.269  213.159.38.184   192.168.0.109 
2015-04-01 22:38:53.269   192.168.0.109  213.159.38.184 
2015-04-01 22:39:14.995    54.83.28.184   192.168.0.109

What I want to do is to determine weather newly appeared ip addresses weren't listed anywhere above so i can flag them as new and save them somewhere else. I would consider them as new even if they appeared in the last few days.

What would be the best programming solution with perl?

  • Read a line. If it is "young" enough, check the IP address to the values in a hash. If the hash contains the IP addressm it means it appeared before. If not, it is a new address. But I'm supposing this logic if for a script you would execute when you needed the information. – AntonH Apr 02 '15 at 07:02
  • Is this file growth continuously (similar to syslog) and you read it similar to "tail -f" ? Or you just parse it every time and do the logic? – Ken Cheung Apr 02 '15 at 07:02
  • Agree with AntonH, use a hash, IP as key, date-time as value. if the hash entry is undef, its new. If it was defined, compare the value (date-time) for the difference (1 or 2 days). Remember to update the hash value of the newly read line. – Ken Cheung Apr 02 '15 at 07:04
  • File is not updated continously, It is updated only when you need to do the logic. – Džiugas Balčytis Apr 02 '15 at 07:35

1 Answers1

0

Hash is usually used for this kind of tasks. Expect we have defined time from which IP is considered new.

use strict;
use warnings;

sub parse_time {
    local $_ = shift if @_;
    split /[-\s:]+/;
}

sub cmp_array {
    my $ref = shift;
    for my $i ( 0 .. $#$ref ) {
        my $cmp = $ref->[$i] <=> $_[$i];
        return $cmp if $cmp;
    }
    return ();
}

die "Not enough parameners" unless @ARGV;

my $since = [ parse_time(shift) ];
my %seen;
while (<>) {
    my ( $date, $time, @ips ) = split;
    next unless @ips > 1;  # expect at least two IP, otherwise malformed data;
    if ( cmp_array( $since, parse_time("$date $time") ) < 0 .. 0 ) {
        exists $seen{$_} or print "$date $time $_\n" for @ips;
    }
    @seen{@ips} = ();
}

Example output

$ perl code.pl '2015-04-01 22:38:43' file.txt
2015-04-01 22:38:43.500 192.168.0.109
2015-04-01 22:38:43.500 78.57.218.154
2015-04-01 22:38:43.500 213.159.38.184
2015-04-01 22:38:46.359 178.250.32.43
2015-04-01 22:39:14.995 54.83.28.184

If you like know if some IP was first time appeared in last two days you can use

perl code.pl "$(date --date='2 days ago' '+%Y-%m-%d %H:%M:%S')" file.txt | grep 192.168.0.109

For example

$ perl code.pl "$(date --date='9 days ago' '+%Y-%m-%d %H:%M:%S')" file.txt | grep -q 192.168.0.109 && echo NEWBIE || echo OLD DOG
NEWBIE

But, in this case, you don't have to use Perl at all

( cat file.txt; echo $(date --date='2 days ago' '+%Y-%m-%d %H:%M:%S') MY_SUPER_DELIMITER ) |
sed 's/\s\+/\t/g' | sort | cut -f 3,4 | sed '/^MY_SUPER_DELIMITER$/,$d' |
grep -q 192.168.0.109 && echo OLD DOG || echo NEWBIE
Hynek -Pichi- Vychodil
  • 26,174
  • 5
  • 52
  • 73
  • In your example what would work for me would be the case, where i can search not by date but by ip address. Instead of writing '2015-04-01 22:38:43' i want to write ip address like 192.168.0.109. And if this ip address appeared ONLY in the past two days - i would consider it as new and export it to a file. Any solutions ? – Džiugas Balčytis Apr 10 '15 at 07:54
  • @DžiugasBalčytis: I'm not sure if I understand you. So you want hard-coded limit _last two days_ and you are willing to insert IP by hand? It sounds weird to me. Anyway, you can modify mine solution or write your own solution. It is even simpler. Just like split your file by date and then grep for the IP. – Hynek -Pichi- Vychodil Apr 10 '15 at 10:13