3

i have a large csv file with 150 columns, a sample of which is given below::

id,c1,c2,c3,c4,c5...
1,0,acc,123.4E+03,0,bdd,...
2,1.299E-05,bef,1.666E-08,23,ghh....

As u can see some fields are having values in scientific notations(which all columns are having values in scientific notations is not known given the fact that the csv file has over 5 Billion rows).

I need to convert the values in scientific notations to its corresponding decimal form. I came across the following solution: Convert scientific notation to decimal in multiple fields and obtained the following code:

#!/usr/bin/awk -f
BEGIN {
d = "[[:digit:]]"
OFS = FS = ","
}
{
delim = ""
for (i = 1; i <= NF; i++) {
    if ($i ~ d "E+" d d d "$") {
        printf "%s%.41f", delim, $i
    }
    else {
        printf "%s%s", delim, $i
    }
    delim = OFS
}
printf "\n"
}

But the above script is not working for me. The above script returns my input file as it is(for E+ values and for E- values) without no conversion. I'm fairly new to shell scripting, Any ideas?

I'm executing the script in this form:

chmod u+x awkscript.awk
./awkscript.awk inputfile.csv
Community
  • 1
  • 1
Zaire
  • 67
  • 7
  • If this gets closed for being a duplicate, please at least copy the `CONVFMT` solution to the post linked above. – James Brown Dec 31 '16 at 09:25
  • Where does the csv come from? Excel xlsx, database? Perhaps you could change the cell format or export options avoiding the numeric format. You might even win extra precision. – Walter A Dec 31 '16 at 09:45

1 Answers1

5

This may help you

Input

$ cat f
id,c1,c2,c3,c4,c5...
1,0,acc,123.4E+03,0,bdd,...
2,1.299E-05,bef,1.666E-08,23,ghh....

Output

$ awk 'BEGIN{CONVFMT="%.9f"; FS=OFS=","}{for(i=1; i<=NF; i++)if($i~/^[0-9]+([eE][+-][0-9]+)?/)$i+=0;}1' f
id,c1,c2,c3,c4,c5...
1,0,acc,123400,0,bdd,...
2,0.000012990,bef,0.000000017,23,ghh....

From man awk:

A numeric expression is converted to string by replacing expr with sprintf(CONVFMT, expr), unless expr can be represented on the host machine as an exact integer then it is converted to sprintf("%d", expr). Sprintf() is an AWK built-in that duplicates the functionality of sprintf(3), and CONVFMT is a built-in variable used for internal conversion from number to string and initialized to "%.6g". Explicit type conversions can be forced, expr "" is string and expr+0 is numeric.

So you can arrange CONVFMT variable on the beginning or format field.

Akshay Hegde
  • 16,536
  • 2
  • 22
  • 36
  • I tried solving this with `$i=sprintf("%.9f", $i)` but that left me with more noise, for example `123400.000000000`. Any sane way if removing the trailing zeros from for example `0.000012990`? I toyed around with `gensub` but that's Gnu awk only. – James Brown Dec 31 '16 at 09:21
  • @Akshay: Brilliant buddy. You solved it. Thanks a lot. Sorry for not responding earlier, we had a black out and all lines were down. – Zaire Jan 01 '17 at 06:05
  • That command doesn't work with negative numbers. add `[-]?` to fix that. – s.ouchene Nov 14 '21 at 18:16