0

I have a data frame in R with box office number listed like $121.5M and $0.014M and I'd like to convert them to straight numbers. I'm thinking of striping the $ and M and then using basic multiplication. Is there a better way to do this?

Phillip Black
  • 105
  • 1
  • 2
  • 10

3 Answers3

3

You could do this either by matching the non-numeric elements ([^0-9.]*) and replace it by ''

 as.numeric(gsub("[^0-9.]*", '', "$121.5M"))
 #[1] 121.5

Or by specifically matching the $ and M ([$M]) and replace it with ''

 as.numeric(gsub("[$M]", '',"$121.5M"))
 #[1] 121.5

Update

If you have a vector like below

v1 <- c("$1.21M", "$0.5B", "$100K", "$1T", "$0.9P", "$1.5K") 

Create another vector with the numbers and set the names with the corresponding abbrevations

v2 <- setNames(c(1e3, 1e6, 1e9, 1e12, 1e15), c('K', 'M', 'B', 'T', 'P'))

Use that as index to replace the abbrevation and multiply it with the numeric part of the vector.

 as.numeric(gsub("[^0-9.]*", '',v1))* v2[sub('[^A-Z]*', '', v1)]
akrun
  • 874,273
  • 37
  • 540
  • 662
2

This removes the $ and translates K and M to e3 and e6. There is an example very similar to this in the gsubfn vignette.

library(gsubfn)
x <- c("$1.21M", "$100K")  # input

ch <- gsubfn("[KM$]", list(K = "e3", M = "e6", "$" = ""), x)
as.numeric(ch)
## [1] 1210000  100000

The as.numeric line can be omitted if you don't need to convert it to numeric.

G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
2

The function extract_numeric from the tidyr package strips all non-numeric characters from a string and returns a number. With your example:

library(tidyr)
dat <- data.frame(revenue = c("$121.5M", "$0.014M"))
dat$revenue2 <- extract_numeric(dat$revenue)*1000000

dat
  revenue  revenue2
1 $121.5M 121500000
2 $0.014M     14000
Sam Firke
  • 21,571
  • 9
  • 87
  • 105