Netflow column manipulation bash

Question

I have a netflow output in which certain lines are showing 'M' after bytes:

2014-05-10 14:26:49.231    10.335 UDP     114.31.254.227:24874 ->    56.213.85.253:13617        9     1139     1
2014-05-10 14:26:59.494     0.222 UDP     114.31.254.193:17769 ->   165.199.57.179:40012        3      172     1
2014-05-10 14:26:56.015     3.348 TCP      96.196.161.39:80    ->   114.31.255.131:61066     5428    7.8 M     1
2014-05-10 14:26:59.705     0.246 UDP     165.199.57.144:40007 ->   114.31.254.193:17769        3      140     1

As can be seen there is an instance of '7.8 M' which I would like to show as its true byte value, not megabytes.

I want to replace all megabyte vales with their byte vales (multiplying by 1,048,576).

Code along the lines of: match '[number string] M ' multiply number by 1048576 and replace

The column is 9-10 on lines with M

Perhaps using awk?:

cat whitespacetrim.out | grep ' M ' | cut -f 9,10 -d ' '| cut -f 1 -d ' ' | awk '{val=$1*1024*1024} END {print val}'|

You don't need a chain of pipes and greps and cuts if you are using awk. And you never need a UUOC. — Ed Morton, Apr 14 '15 at 13:32

score 1 · Answer 1 · answered Apr 14 '15 at 13:16

A way with column widths variable in Gawk.

awk 'BEGIN{FIELDWIDTHS="101 5 100"}gsub("M","",$2){$2=$2*1048576}1' test | column -t

Output

2014-05-10  14:26:49.231  10.335  UDP  114.31.254.227:24874  ->  56.213.85.253:13617   9     1139         1
2014-05-10  14:26:59.494  0.222   UDP  114.31.254.193:17769  ->  165.199.57.179:40012  3     172          1
2014-05-10  14:26:56.015  3.348   TCP  96.196.161.39:80      ->  114.31.255.131:61066  5428  8.17889e+06  1
2014-05-10  14:26:59.705  0.246   UDP  165.199.57.144:40007  ->  114.31.254.193:17769  3     140          1

Explanation

Sets the column widths, the field we want starts at position 101 so that is the first number to put all the rest in field one, the field is 5 characters long so that is the second field width and then 100 is just to catch everything else.
Checks if field 2 has an M in it whilst also replacing said M with nothing
If it does then field 2 is multiplied by 1048576
1 in awk evaluates to true and the default action is to print the line.
Pipe into column -t so it looks presentable :)

Ed Morton · Answer 2 · 2015-04-14T14:59:56.907

Retaining the original spacing and field alignment using GNU awk for the 3rd arg to match() and \s/\S:

$ cat tst.awk
NF==11 {
    match($0,/((\S+\s+){7}\S+)((\s+\S+){2})(.*)/,a)
    $0 = a[1] sprintf("%*d",length()-length(a[1]a[5]),$9*1048576) a[5]
}
{ print }
$
$ awk -f tst.awk file
2014-05-10 14:26:49.231    10.335 UDP     114.31.254.227:24874 ->    56.213.85.253:13617        9     1139     1
2014-05-10 14:26:59.494     0.222 UDP     114.31.254.193:17769 ->   165.199.57.179:40012        3      172     1
2014-05-10 14:26:56.015     3.348 TCP      96.196.161.39:80    ->   114.31.255.131:61066     5428  8178892     1
2014-05-10 14:26:59.705     0.246 UDP     165.199.57.144:40007 ->   114.31.254.193:17769        3      140     1

The match() carves the input record up into 3 segments - the part up to and including the 8th field ((\S+\s+){7}\S+), then the 9th and 10th field plus the spaces before them ((\s+\S+){2}), then everything after it (.*) which in this case is just the final spaces followed by the 11th field.

The assignment then recreates $0 from the leading part and the trailing part with the spaces+9th+spaces+10th fields being replaced by the new calculated value padded to the original width they took up in total.

score 0 · Answer 3 · answered Apr 14 '15 at 12:59

0

Through awk,

$ awk '/([0-9]+\.[0-9]+|[0-9]+)[[:blank:]]*M/{$9=$9*1048576;$10=""}{$1=$1}1' file
2014-05-10 14:26:49.231 10.335 UDP 114.31.254.227:24874 -> 56.213.85.253:13617 9 1139 1
2014-05-10 14:26:59.494 0.222 UDP 114.31.254.193:17769 -> 165.199.57.179:40012 3 172 1
2014-05-10 14:26:56.015 3.348 TCP 96.196.161.39:80 -> 114.31.255.131:61066 5428 8.17889e+06  1
2014-05-10 14:26:59.705 0.246 UDP 165.199.57.144:40007 -> 114.31.254.193:17769 3 140 1

answered Apr 14 '15 at 12:59

Avinash Raj

172,303
28
230
274

Only works for this specific example, add an M to the value in line 2 and you will see that it gives the wrong output. Why are you deleting field 10 ? – Apr 14 '15 at 13:06
Oh yeah, i added the M without a space, my bad :) – Apr 14 '15 at 13:09

Netflow column manipulation bash

3 Answers3

Output

Explanation