Convert all number abbreviations to numeric values in a text file

Question

I'd like to convert all number abbreviations such as 1K, 100K, 1M, etc. in a text file into plain numeric values such as 1000, 100000, 1000000, etc.

So for example, if I have the following text file:

1.3K apples
87.9K oranges
156K mangos
541.7K carrots
1.8M potatoes

I would like to convert it to the following in bash:

1300 apples
87900 oranges
156000 mangos
541700 carrots
1800000 potatoes

The command I have used is to replace matching strings of number abbreviations with their full numeric values like so:

sed -e 's/1K/1000/g' -e 's/1M/1000000/g' text-file.txt

My problem is that I cannot find and replace ALL of the possible number abbreviations when variation occurs. I'd like to do this until at least up to one decimal abbreviations.

oguz ismail · Answer 1 · 2021-01-03T14:29:27.563

98

Use numfmt from GNU coreutils, don't reinvent the wheel.

$ numfmt --from=si <file
1300 apples
87900 oranges
156000 mangos
541700 carrots
1800000 potatoes

If abbreviated numbers may appear as any field, then you can use:

numfmt --from=si --field=- --invalid=ignore <file

edited Jan 03 '21 at 14:29

answered Jan 02 '21 at 07:21

oguz ismail

1
16
47
69

score 29 · Answer 2 · edited Jan 02 '21 at 14:45

Could you please try following, written and tested with shown samples in GNU awk.

awk '
{
  if(sub(/[kK]$/,"",$1)){
    $1*=1000
  }
  if(sub(/[mM]$/,"",$1)){
    $1*=1000000
  }
}
1
' Input_file

Explanation: Adding detailed explanation for above.

awk '                     ##Starting awk program from here.
{
  if(sub(/[kK]$/,"",$1)){ ##Checking condition if 1st field ends with k/K then do following. Substituting k/K in first field with NULL here.
    $1*=1000              ##Multiplying 1000 with current 1st field value here.
  }
  if(sub(/[mM]$/,"",$1)){ ##Checking condition if 1st field ends with m/M then do following. Substituting m/M in first field with NULL here.
    $1*=1000000          ##Multiplying 1000000 with current 1st field value here.
  }
}
1                         ##1 will print current line here.
' Input_file              ##Mentioning Input_file name here.

Output will be as follows.

1300 apples
87900 oranges
156000 mangos
541700 carrots
1800000 potatoes

anubhava · Answer 3 · 2021-01-02T15:04:58.070

17

Another awk variant:

awk '{q = substr($1, length($1));
$1 *= (q == "M" ? 1000000 : (q=="K"?1000:1))} 1' file

1300 apples
87900 oranges
156000 mangos
541700 carrots
1800000 potatoes

edited Jan 02 '21 at 15:04

answered Jan 02 '21 at 05:39

anubhava

761,203
64
569
643

score 16 · Answer 4 · edited Jan 03 '21 at 05:15

16

This performs a global substitution (in case you have >1 string to convert per line):

perl -pe 's{\b(\d+(?:\.\d+)?)([KM])\b}{ $1*1000**(index("KM",$2)+1) }ge' file

edited Jan 03 '21 at 05:15

oguz ismail

1
16
47
69

answered Jan 02 '21 at 05:57

kvantour · Answer 5 · 2021-01-02T16:54:11.777

8

In a bit more of a programming way, and based on this answer, you can create a list of all possible conversion factors and preform the multiplications when needed:

awk 'BEGIN{f["K"]=1000; f["M"]=1000000}
     match($1,/[a-zA-Z]+/){$1 *= f[substr($1,RSTART,RLENGTH)]}
     1' file

edited Jan 02 '21 at 16:54

answered Jan 02 '21 at 11:34

kvantour

25,269
4
47
72

score 8 · Answer 6 · answered Jan 02 '21 at 15:00

8

With GNU awk for gensub():

$ awk '
    BEGIN { mult[""]=1; mult["k"]=1000; mult["m"]=100000 }
    { $1 *= mult[gensub(/[^[:alpha:]]/,"","g",tolower($1))] }
1' file
1300 apples
87900 oranges
156000 mangos
541700 carrots
180000 potatoes

answered Jan 02 '21 at 15:00

Ed Morton

188,023
17
78
185

The fourth bird · Answer 7 · 2021-01-02T14:52:20.983

Another option might be using bash only and a pattern with capturing groups, where you would capture either M or K. If the pattern matches, then test for one of them and set the multiplier and use bc

while IFS= read -r line
do
  if [[ $line =~ ^([[:digit:]]+(\.[[:digit:]]+)?)([MK])( .*)$ ]];then
    echo "$(bc <<< "${BASH_REMATCH[1]} * $([ ${BASH_REMATCH[3]} == "K" ] && echo "1000" || echo "1000000") / 1")${BASH_REMATCH[4]}"
  fi
done < text-file.txt

Output

1300 apples
87900 oranges
156000 mangos
541700 carrots
1800000 potatoes

Bash demo

dawg · Answer 8 · 2021-01-02T15:18:33.297

6

Given:

$ cat file
1.3K apples
87.9K oranges
156K mangos
541.7K carrots
1.8M potatoes

Just for giggles, pure Bash (with sed and bc):

while read -r x y 
do 
    new_x=$(echo "$x" | sed -E 's/^([[:digit:].]*)[kK]/\1\*1000/; s/^([[:digit:].]*)[mM]/\1\*1000000/' | bc)
    printf "%'d %s\n" "$new_x" "$y"
done <file

Prints:

1,300 apples
87,900 oranges
156,000 mangos
541,700 carrots
1,800,000 potatoes

edited Jan 02 '21 at 15:18

answered Jan 02 '21 at 14:00

dawg

98,345
23
131
206

potong · Answer 9 · 2021-01-26T11:55:57.650

6

This might work for you (GNU sed):

sed -E '1{x;s/^/K00M00000/;x}
        :a;G;s/([0-9])(\.([0-9]))?([KM])(.*)\n.*\4(0*).*/\1\3\6\5/i;ta
        P;d' file

Create a lookup and store it in the hold space.

Append the lookup to each line and use pattern matching to replace the keys in the lookup by its value.

Finally print the line when no further matches are found.

edited Jan 26 '21 at 11:55

answered Jan 03 '21 at 01:05

potong

55,640
6
51
83

score 0 · Answer 10 · answered Feb 02 '23 at 12:29

here's how to convert them to both base-2 and base-10 scales from kilo all the way to the recent addition of quetta-(Q) (10^30)

-— (plus one special case of treating B(illion) as G):

echo '1.3K apples
87.9K oranges
156K mangos
541.7K carrots
1.8M potatoes
1189.135311B peaches
1189.135311G grapes
73.3231T sourgrapes' | gtee >( gcat -b >&2;) |

.

{m,g,n}awk '

 ($!NF = sprintf("%s %s %s %s", __ = toupper($++_), __, __, $++_))^!_ + \
 ($_   = __ * ((_____ = (_+=_)^_*_--) + _-_^_--)^(sub("B$","G",__)^!_ * \
                    index(____="KMGTPEZYRQ",___=substr(__,length(__)))))^!_ + \
 ($++_ = __ * _____^index(____, ___))^(_-=_)' CONVFMT='%.f' |

column -t

.

     1  1.3K apples
     2  87.9K oranges
     3  156K mangos
     4  541.7K carrots
     5  1.8M potatoes
     6  1189.135311B peaches
     7  1189.135311G grapes
     8  73.3231T sourgrapes

1.3K          1300            1331            apples
87.9K         87900           90010           oranges
156K          156000          159744          mangos
541.7K        541700          554701          carrots
1.8M          1800000         1887437         potatoes
1189.135311B  1189135311000   1276824317816   peaches
1189.135311G  1189135311000   1276824317816   grapes
73.3231T      73323100000000  80619601034582  sourgrapes

Convert all number abbreviations to numeric values in a text file

10 Answers10