0

I have a netflow output in which certain lines are showing 'M' after bytes:

2014-05-10 14:26:49.231    10.335 UDP     114.31.254.227:24874 ->    56.213.85.253:13617        9     1139     1
2014-05-10 14:26:59.494     0.222 UDP     114.31.254.193:17769 ->   165.199.57.179:40012        3      172     1
2014-05-10 14:26:56.015     3.348 TCP      96.196.161.39:80    ->   114.31.255.131:61066     5428    7.8 M     1
2014-05-10 14:26:59.705     0.246 UDP     165.199.57.144:40007 ->   114.31.254.193:17769        3      140     1

As can be seen there is an instance of '7.8 M' which I would like to show as its true byte value, not megabytes.

I want to replace all megabyte vales with their byte vales (multiplying by 1,048,576).

Code along the lines of: match '[number string] M ' multiply number by 1048576 and replace

The column is 9-10 on lines with M

Perhaps using awk?:

cat whitespacetrim.out | grep ' M ' | cut -f 9,10 -d ' '| cut -f 1 -d ' ' | awk '{val=$1*1024*1024} END {print val}'|
user3770935
  • 298
  • 1
  • 3
  • 17

3 Answers3

1

A way with column widths variable in Gawk.

awk 'BEGIN{FIELDWIDTHS="101 5 100"}gsub("M","",$2){$2=$2*1048576}1' test | column -t

Output

2014-05-10  14:26:49.231  10.335  UDP  114.31.254.227:24874  ->  56.213.85.253:13617   9     1139         1
2014-05-10  14:26:59.494  0.222   UDP  114.31.254.193:17769  ->  165.199.57.179:40012  3     172          1
2014-05-10  14:26:56.015  3.348   TCP  96.196.161.39:80      ->  114.31.255.131:61066  5428  8.17889e+06  1
2014-05-10  14:26:59.705  0.246   UDP  165.199.57.144:40007  ->  114.31.254.193:17769  3     140          1

Explanation

  1. Sets the column widths, the field we want starts at position 101 so that is the first number to put all the rest in field one, the field is 5 characters long so that is the second field width and then 100 is just to catch everything else.

  2. Checks if field 2 has an M in it whilst also replacing said M with nothing

  3. If it does then field 2 is multiplied by 1048576

  4. 1 in awk evaluates to true and the default action is to print the line.

  5. Pipe into column -t so it looks presentable :)

1

Retaining the original spacing and field alignment using GNU awk for the 3rd arg to match() and \s/\S:

$ cat tst.awk
NF==11 {
    match($0,/((\S+\s+){7}\S+)((\s+\S+){2})(.*)/,a)
    $0 = a[1] sprintf("%*d",length()-length(a[1]a[5]),$9*1048576) a[5]
}
{ print }
$
$ awk -f tst.awk file
2014-05-10 14:26:49.231    10.335 UDP     114.31.254.227:24874 ->    56.213.85.253:13617        9     1139     1
2014-05-10 14:26:59.494     0.222 UDP     114.31.254.193:17769 ->   165.199.57.179:40012        3      172     1
2014-05-10 14:26:56.015     3.348 TCP      96.196.161.39:80    ->   114.31.255.131:61066     5428  8178892     1
2014-05-10 14:26:59.705     0.246 UDP     165.199.57.144:40007 ->   114.31.254.193:17769        3      140     1

The match() carves the input record up into 3 segments - the part up to and including the 8th field ((\S+\s+){7}\S+), then the 9th and 10th field plus the spaces before them ((\s+\S+){2}), then everything after it (.*) which in this case is just the final spaces followed by the 11th field.

The assignment then recreates $0 from the leading part and the trailing part with the spaces+9th+spaces+10th fields being replaced by the new calculated value padded to the original width they took up in total.

Ed Morton
  • 188,023
  • 17
  • 78
  • 185
0

Through awk,

$ awk '/([0-9]+\.[0-9]+|[0-9]+)[[:blank:]]*M/{$9=$9*1048576;$10=""}{$1=$1}1' file
2014-05-10 14:26:49.231 10.335 UDP 114.31.254.227:24874 -> 56.213.85.253:13617 9 1139 1
2014-05-10 14:26:59.494 0.222 UDP 114.31.254.193:17769 -> 165.199.57.179:40012 3 172 1
2014-05-10 14:26:56.015 3.348 TCP 96.196.161.39:80 -> 114.31.255.131:61066 5428 8.17889e+06  1
2014-05-10 14:26:59.705 0.246 UDP 165.199.57.144:40007 -> 114.31.254.193:17769 3 140 1
Avinash Raj
  • 172,303
  • 28
  • 230
  • 274
  • Only works for this specific example, add an M to the value in line 2 and you will see that it gives the wrong output. Why are you deleting field 10 ? –  Apr 14 '15 at 13:06
  • Oh yeah, i added the M without a space, my bad :) –  Apr 14 '15 at 13:09