4

I'm curious if there's any sort of out of the box functions in R that can handle this.

I have a CSV file that I am reading into a data frame using read.csv. One of the columns in the CSV contains currency values in the format of

Currency
--------
$1.2M
$3.1B
N/A

I would like to convert those into more usable numbers that calculations can be performed against, so it would look like this:

Currency
----------
1200000
3100000000
NA

My initial thoughts were to somehow subset the dataframe into 3 parts based on rows that contain *M, *B, or N/A. Then use gsub to replace the $ and M/B, then multiply the remaining number by 1000000 or 1000000000, and finally rejoin the 3 subsets back into 1 data frame.

However I'm curious if there's a simpler way to handle this sort of conversion in R.

smci
  • 32,567
  • 20
  • 113
  • 146
user3246693
  • 679
  • 11
  • 22

2 Answers2

3

We could use gsubfn to replace the 'B', 'M' with 'e+9', 'e+6' and convert to numeric (as.numeric).

is.na(v1) <- v1=='N/A'
options(scipen=999)
library(gsubfn)
as.numeric(gsubfn('([A-Z]|\\$)', list(B='e+9', M='e+6',"$"=""),v1)) 
#[1]    1200000 3100000000         NA

EDIT: Modified based on @nicola's suggestion

data

v1 <- c('$1.2M', '$3.1B', 'N/A')
Community
  • 1
  • 1
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 1
    Since you are using `gsubfn`, wouldn't `as.numeric(gsubfn('([A-Z]|\\$)', list(B='e+9', M='e+6',"$"=""),v1))` be simpler? No need to `sapply`, `unname`, `eval` and `parse` some text and neither to set `NA` beforehand (since `as.numeric` will do it by itself). – nicola Nov 18 '15 at 08:20
  • @nicola Yes, it is simpler. I wonder why I used `sapply`. Thanks. – akrun Nov 18 '15 at 08:23
  • 1
    Thank you! that was exactly what i was looking for. I did however drop the "is.na(v1) <- v1=='N/A'" as I realized I could do this with read.csv using the this option: na.strings="n/a" – user3246693 Nov 19 '15 at 19:36
1

Another way, is using a for-loop :

x <- c("1.2M", "2.5M", "1.6B", "N/A")
x <- ifelse(x=="N/A", NA, x)
num <- as.numeric(strsplit(x, "[^0-9.]+"))

for(i in 1:length(x)) {
 if(grepl('M', x[i]))
  print(prod(num[i], 1000000))
  else
  print(prod(num[i], 100000000))
}

# [1] 1200000
# [1] 2500000
# [1] 1.6e+08
# [1] NA
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • thank you for the advice, though I am doing my best to "break" myself of the habbit of using loops. R seems to discourage them, and i'm trying to get a feel for the :"R way" of doing things. I do however fully understand where you are coming from as most other scripting languages would handle this sort of thing with a loop :) – user3246693 Nov 19 '15 at 19:38
  • @user3246693 yes, that is true! Loops are not recommended in R. – Ronak Shah Nov 20 '15 at 03:26