How can I get the number of flows, the flows and the packets per flow from a .pcap file?

Question

I'm working with big .pcap packet network captures (> 5GB each file) and I'm trying to group the packets in flows (For example, group by IP Source, IP Destination, Source Port, Destination Port and Layer 4 Protocol). I use some software tools as Scapy, CapLoader, tcpdump, tshark, etc, but I can't find the solution I want.

By .pcap packet file I would like to know the number of flows, the flows and then find out which packets belong to each flow

Any idea what is the best way to proceed?

I apologize if the explanation is not very clear , and I'm willing to provide any further explanation or clarification.

Thanks very much.

Regards.

ND Geek · Answer 1 · 2015-05-18T18:37:41.397

If you're willing to write some code, a Perl script might be useful. Over at Ask Wireshark, someone asked a somewhat similar question and got a response suggesting the use of Net::Pcap and a handful of NetPacket::* packages to process the files yourself. That answer has a sample script aimed at the specific question over there, but it wouldn't be hard to derive something to get at the information you want.

EDIT: I thought this was interesting so I wrote up a sample script.

pcapFlows.pl:

#!/usr/bin/perl

use strict;
use warnings;

use Net::Pcap;
use NetPacket::Ethernet qw(:types :strip);
use NetPacket::IP qw(:protos :strip);
use NetPacket::TCP;
use NetPacket::UDP;

my $pcap_file = $ARGV[0];

if (not $pcap_file) { 
    die("ERROR: please give pcap file name on the cli\n")
};

my $err = undef;

my %tracks;

# read data from pcap file.
my $pcap = pcap_open_offline($pcap_file, \$err) or die "Can't read $pcap_file : $err\n";
pcap_loop($pcap, -1, \&process_packet, \%tracks);

# close the device
pcap_close($pcap);

my $flowCount = 0;
# process tracks here

print "Flows\n";

foreach my $src (keys %tracks) {
    foreach my $dst (keys %{$tracks{$src}}) {
        foreach my $prot (keys %{$tracks{$src}{$dst}}) {
            $flowCount++;
            my $filename = "${src}_to_${dst}_$prot.pcap";
            print "$filename\n";

            my ($source, $dest) = ($src, $dst);

            $source =~ s/-/:/;
            $dest =~ s/-/:/;
            my $pktCount = 0;
            my $pcap_dumper = pcap_dump_open($pcap, $filename);
            foreach my $packet (@{$tracks{$src}{$dst}{$prot}{'packets'}}) {
                $pktCount++;
                pcap_dump($pcap_dumper, $packet->{'hdr'}, $packet->{'pkt'});
            }
            pcap_dump_flush($pcap_dumper);
            pcap_dump_close($pcap_dumper);

            print "$source <-> $dest ($pktCount)\n";
        }
    }
}

print "$flowCount flows found.\n";

sub process_packet {
    my ($user_data, $header, $packet) = @_;

    my $ip = NetPacket::IP->decode(eth_strip($packet));

    my $src_ip = $ip->{src_ip};
    my $dst_ip = $ip->{dest_ip};

    my ($prot, $payload);

    if ($ip->{proto} == IP_PROTO_TCP) {
        $prot = 'tcp';
        $payload = NetPacket::TCP->decode($ip->{data});
    } elsif ($ip->{proto} == IP_PROTO_UDP) {
        $prot = 'udp';
        $payload = NetPacket::UDP->decode($ip->{data});
    } else {
        return;
    }

    my $src_port = $payload->{src_port};
    my $dst_port = $payload->{dest_port};


    if (defined($user_data->{"$src_ip-$src_port"}) || !(defined($user_data->{"$src_ip-$src_port"}) || defined($user_data->{"$dst_ip-$dst_port"}))) {
        $user_data->{"$src_ip-$src_port"}{"$dst_ip-$dst_port"}{$prot}{'count'}++;
        push(@{$user_data->{"$src_ip-$src_port"}{"$dst_ip-$dst_port"}{$prot}{'packets'}}, {'hdr' => $header, 'pkt' => $packet});
    } elsif (defined($user_data->{"$dst_ip-$dst_port"})) {
        $user_data->{"$dst_ip-$dst_port"}{"$src_ip-$src_port"}{$prot}{'count'}++;
        push(@{$user_data->{"$dst_ip-$dst_port"}{"$src_ip-$src_port"}{$prot}{'packets'}}, {'hdr' => $header, 'pkt' => $packet});
    }
}

How can I get the number of flows, the flows and the packets per flow from a .pcap file?

1 Answers1