45

I'd like to convert all number abbreviations such as 1K, 100K, 1M, etc. in a text file into plain numeric values such as 1000, 100000, 1000000, etc.

So for example, if I have the following text file:

1.3K apples
87.9K oranges
156K mangos
541.7K carrots
1.8M potatoes

I would like to convert it to the following in bash:

1300 apples
87900 oranges
156000 mangos
541700 carrots
1800000 potatoes

The command I have used is to replace matching strings of number abbreviations with their full numeric values like so:

sed -e 's/1K/1000/g' -e 's/1M/1000000/g' text-file.txt

My problem is that I cannot find and replace ALL of the possible number abbreviations when variation occurs. I'd like to do this until at least up to one decimal abbreviations.

oguz ismail
  • 1
  • 16
  • 47
  • 69
chiappa
  • 1,298
  • 10
  • 21

10 Answers10

98

Use numfmt from GNU coreutils, don't reinvent the wheel.

$ numfmt --from=si <file
1300 apples
87900 oranges
156000 mangos
541700 carrots
1800000 potatoes

If abbreviated numbers may appear as any field, then you can use:

numfmt --from=si --field=- --invalid=ignore <file
oguz ismail
  • 1
  • 16
  • 47
  • 69
29

Could you please try following, written and tested with shown samples in GNU awk.

awk '
{
  if(sub(/[kK]$/,"",$1)){
    $1*=1000
  }
  if(sub(/[mM]$/,"",$1)){
    $1*=1000000
  }
}
1
' Input_file

Explanation: Adding detailed explanation for above.

awk '                     ##Starting awk program from here.
{
  if(sub(/[kK]$/,"",$1)){ ##Checking condition if 1st field ends with k/K then do following. Substituting k/K in first field with NULL here.
    $1*=1000              ##Multiplying 1000 with current 1st field value here.
  }
  if(sub(/[mM]$/,"",$1)){ ##Checking condition if 1st field ends with m/M then do following. Substituting m/M in first field with NULL here.
    $1*=1000000          ##Multiplying 1000000 with current 1st field value here.
  }
}
1                         ##1 will print current line here.
' Input_file              ##Mentioning Input_file name here.

Output will be as follows.

1300 apples
87900 oranges
156000 mangos
541700 carrots
1800000 potatoes
Ed Morton
  • 188,023
  • 17
  • 78
  • 185
RavinderSingh13
  • 130,504
  • 14
  • 57
  • 93
17

Another awk variant:

awk '{q = substr($1, length($1));
$1 *= (q == "M" ? 1000000 : (q=="K"?1000:1))} 1' file

1300 apples
87900 oranges
156000 mangos
541700 carrots
1800000 potatoes
anubhava
  • 761,203
  • 64
  • 569
  • 643
16

This performs a global substitution (in case you have >1 string to convert per line):

perl -pe 's{\b(\d+(?:\.\d+)?)([KM])\b}{ $1*1000**(index("KM",$2)+1) }ge' file
oguz ismail
  • 1
  • 16
  • 47
  • 69
8

In a bit more of a programming way, and based on this answer, you can create a list of all possible conversion factors and preform the multiplications when needed:

awk 'BEGIN{f["K"]=1000; f["M"]=1000000}
     match($1,/[a-zA-Z]+/){$1 *= f[substr($1,RSTART,RLENGTH)]}
     1' file
kvantour
  • 25,269
  • 4
  • 47
  • 72
8

With GNU awk for gensub():

$ awk '
    BEGIN { mult[""]=1; mult["k"]=1000; mult["m"]=100000 }
    { $1 *= mult[gensub(/[^[:alpha:]]/,"","g",tolower($1))] }
1' file
1300 apples
87900 oranges
156000 mangos
541700 carrots
180000 potatoes
Ed Morton
  • 188,023
  • 17
  • 78
  • 185
7

Another option might be using bash only and a pattern with capturing groups, where you would capture either M or K. If the pattern matches, then test for one of them and set the multiplier and use bc

while IFS= read -r line
do
  if [[ $line =~ ^([[:digit:]]+(\.[[:digit:]]+)?)([MK])( .*)$ ]];then
    echo "$(bc <<< "${BASH_REMATCH[1]} * $([ ${BASH_REMATCH[3]} == "K" ] && echo "1000" || echo "1000000") / 1")${BASH_REMATCH[4]}"
  fi
done < text-file.txt

Output

1300 apples
87900 oranges
156000 mangos
541700 carrots
1800000 potatoes

Bash demo

The fourth bird
  • 154,723
  • 16
  • 55
  • 70
6

Given:

$ cat file
1.3K apples
87.9K oranges
156K mangos
541.7K carrots
1.8M potatoes

Just for giggles, pure Bash (with sed and bc):

while read -r x y 
do 
    new_x=$(echo "$x" | sed -E 's/^([[:digit:].]*)[kK]/\1\*1000/; s/^([[:digit:].]*)[mM]/\1\*1000000/' | bc)
    printf "%'d %s\n" "$new_x" "$y"
done <file  

Prints:

1,300 apples
87,900 oranges
156,000 mangos
541,700 carrots
1,800,000 potatoes
dawg
  • 98,345
  • 23
  • 131
  • 206
6

This might work for you (GNU sed):

sed -E '1{x;s/^/K00M00000/;x}
        :a;G;s/([0-9])(\.([0-9]))?([KM])(.*)\n.*\4(0*).*/\1\3\6\5/i;ta
        P;d' file

Create a lookup and store it in the hold space.

Append the lookup to each line and use pattern matching to replace the keys in the lookup by its value.

Finally print the line when no further matches are found.

potong
  • 55,640
  • 6
  • 51
  • 83
0

here's how to convert them to both base-2 and base-10 scales from kilo all the way to the recent addition of quetta-(Q) (10^30)

-— (plus one special case of treating B(illion) as G):

echo '1.3K apples
87.9K oranges
156K mangos
541.7K carrots
1.8M potatoes
1189.135311B peaches
1189.135311G grapes
73.3231T sourgrapes' | gtee >( gcat -b >&2;) | 

.

{m,g,n}awk '

 ($!NF = sprintf("%s %s %s %s", __ = toupper($++_), __, __, $++_))^!_ + \
 ($_   = __ * ((_____ = (_+=_)^_*_--) + _-_^_--)^(sub("B$","G",__)^!_ * \
                    index(____="KMGTPEZYRQ",___=substr(__,length(__)))))^!_ + \
 ($++_ = __ * _____^index(____, ___))^(_-=_)' CONVFMT='%.f' |

column -t

.

     1  1.3K apples
     2  87.9K oranges
     3  156K mangos
     4  541.7K carrots
     5  1.8M potatoes
     6  1189.135311B peaches
     7  1189.135311G grapes
     8  73.3231T sourgrapes

1.3K          1300            1331            apples
87.9K         87900           90010           oranges
156K          156000          159744          mangos
541.7K        541700          554701          carrots
1.8M          1800000         1887437         potatoes
1189.135311B  1189135311000   1276824317816   peaches
1189.135311G  1189135311000   1276824317816   grapes
73.3231T      73323100000000  80619601034582  sourgrapes
RARE Kpop Manifesto
  • 2,453
  • 3
  • 11