1

I have one hundred files with three fields. Each one looks like this (with more lines) :

#time data1 data2
20 1.9864547484940e+01 -3.96363547484940e+01
40 2.164547484949e+01 -3.2363547477060e+01 
60 1.9800047484940e+02 -4.06363547484940e+02
…

They are heavy and some of them take up to 1.5G. I would like to reduce their size by saving the two last columns with a lower double precision and deleting the e+0? item. For example, I would like to convert the four lines above to :

#time data1 data2
20 19.865 -39.636
40 21.645 -32.364
60 198.00 -406.36
…

I googled and came across the CONVFMT option of awk. But I don't know how to use it since I'm really not a pro of awk. Is this the right tool to use in my case ? If so, how should I use it ?

I also thought of writing a C++ script, but a direct command line would be great.

dada
  • 1,390
  • 2
  • 17
  • 40

2 Answers2

5

I would use awk's printf function:

awk 'NR==1;NR>1{printf "%d %.3f %.3f\n", $1, $2, $3}' file

The above command outputs:

#time data1 data2
20 19.865 -39.636
40 21.645 -32.364
60 198.000 -406.364

Short explanation:

NR==1 evaluates to true if we are on the first line (NR == number of record). If a condition is not followed by an action (between {}) awk simply prints the line, in this case the headers.

NR>1 evaluates to true on all other lines except the first line of input. It is followed by an action, which uses printf to achieve the desired result.

hek2mgl
  • 152,036
  • 28
  • 249
  • 266
  • great ! only one thing : this seems to convert headers too. How can I tell awk to not consider them ? – dada Sep 01 '15 at 10:01
  • Is NR incremented after the semicolon, so that `NR>1` evaluate to true ? – dada Sep 01 '15 at 10:21
  • No, the whole command is executed *per line*, meaning on every line of input. On the first line, the `NR==1` before the `;` becomes `true` on remaining lines the `NR>1` becomes `true`. – hek2mgl Sep 01 '15 at 10:31
  • Is it possible to evaluate if a line is a header by making an equality test on strings ? Indeed, I have headers not only at the very first line. – dada Sep 01 '15 at 10:49
  • Does a header line always starts with `#` ? – hek2mgl Sep 01 '15 at 11:01
  • yes they all do, so how to test with awk if first character is '#' ? – dada Sep 01 '15 at 14:23
  • Replace `NR==1` by `/^#/` – hek2mgl Sep 01 '15 at 15:01
0

You could use coreutils:

head -n1 infile; tail -n+2 infile | while read n1 n2 n3; do printf "%d %.3f %.3f\n" $n1 $n2 $n3; done

Output:

#time data1 data2
20 19.865 -39.636
40 21.645 -32.364
60 198.000 -406.364
Thor
  • 45,082
  • 11
  • 119
  • 130