0

I have a small issue that I hope you could help me. Let's take the following input file(generated with tcpdump):

00:20:30.812373 52
00:20:30.833678 52
00:20:30.971499 52
00:20:30.993451 52
00:20:31.067043 634
00:20:31.067075 98
00:20:31.068532 31
00:20:31.068532 59
00:20:31.068547 31
00:20:31.068547 59
00:20:31.184758 417
00:20:31.184758 445
00:20:31.184807 205
00:20:31.184807 233
00:20:31.184907 417
00:20:31.184907 445
00:20:31.184945 205
00:20:31.184945 233
00:20:31.188924 52
00:20:31.305726 60
00:20:31.479941 52
00:20:31.491047 1500
00:20:31.491100 652
00:20:31.491118 1500
00:20:31.491133 652
00:20:31.491147 1500
00:20:31.491164 1500
00:20:31.491181 1500
00:20:31.491968 1500
00:20:31.492013 399
00:20:31.492222 399
00:20:31.624795 298
00:20:31.624828 150
00:20:31.634180 798
00:20:31.749103 52
00:20:31.777212 90
00:20:31.869180 212
00:20:31.872662 1500
00:20:31.879724 652
00:20:31.879789 1500
00:20:31.879836 652
00:20:31.879853 186
00:20:31.879867 1500
00:20:31.879882 652
00:20:31.879897 1500
00:20:31.881002 1500
00:20:31.881043 748
00:20:31.883412 1462
00:20:31.883451 1500
00:20:31.885246 652
00:20:31.888708 671
00:20:31.888747 1462
00:20:31.888763 1462
00:20:31.888776 1500
00:20:31.888788 652
00:20:31.954071 1500
00:20:31.954135 1500
00:20:32.010601 1500
00:20:32.010662 1500
00:20:32.015464 1500
00:20:32.015504 1500
00:20:32.025184 1500
00:20:32.025220 757
00:20:32.037594 33
00:20:32.037594 61
00:20:32.037612 33
00:20:32.037612 61
00:20:32.141523 1462
00:20:32.141574 1462
00:20:32.142381 1500
00:20:32.146000 652
00:20:32.146035 824

I have to use awk(or something else in bash) to calculate avg_time avg_size, the average is calculated for each k(k - could be a second, a minute, 30 sec, 10 milliseconds, 1 fractions of a second). The results file will be contains rows with the average for each k.

I cannot skip time, even if k = 30sec and there are no rows for that, I have to show a row in result file with that avg_time and 0 for avg_size. The results will be plotted.

How could I do this? Thank you very much. :)

Cosmin Mihu
  • 264
  • 1
  • 3
  • 9
  • 1
    If you show us what you have tried so far, someone might help you fix it. See http://stackoverflow.com/help/how-to-ask – Niall Cosgrove May 10 '16 at 22:54
  • and how about using an example we don't have to scroll through to see AND posting the expected output given that input? – Ed Morton May 10 '16 at 23:00

1 Answers1

0

You can do something like this with awk:

awk -F"[:. ]" -v k=1 -v d=1000000 '{
        timea=$1":"$2":"$3"."$4
        time=mktime("2000 00 00 "$1" "$2" "$3)""$4
}
NR==1{
        starta=timea
        start=time;
        a[avg]=$5;
        b=1
}
start>(time-(d*k)){
        a[avg]+=$5;
        b++
        enda=timea
}
start<=(time-(d*k)){
        print starta"-"enda,a[avg]/b;
        starta=timea;
        start=time;
        b=1;
        a[avg]=$5
}
END{
        print starta"-"timea,a[avg]/b
}'

File

Time is converted to Epoch time with mktimeand added $4. Here since time is in microseconds to get average for every one second you need value k as 1 and d as 1000000(10^6)

A minute : k=1 and d=100000(10^4)

30 sec : k=30 and d=1000000(10^6) OR k=3 and d=1000000(10^5)

10 milliseconds: k=10 and d=1000000(10^4) etc...

For input you have provided O/P will be (average 1min):

00:20:30.812373-00:20:31.777212 443.946
00:20:31.869180-00:20:32.146035 1036.33

O/p format: starttime-endtime average

jijinp
  • 2,592
  • 1
  • 13
  • 15