26

I have a column in a dataframe as follows:

COL1
$54,345
$65,231
$76,234

How do I convert it into this:

COL1
54345
65231
76234

The way I tried it at first was:

df$COL1<-as.numeric(as.character(df$COL1))

That didn't work because it said NA's were introduced.

Then I tried it like this:

df$COL1<-as.numeric(gsub("\\$","",as.character(df$COL1)))

And the same this happened.

Any ideas?

zx8754
  • 52,746
  • 12
  • 114
  • 209
Nick
  • 833
  • 2
  • 8
  • 11

4 Answers4

53

We could use parse_number from readr package which removes any non-numeric characters.

library(readr)
parse_number(df$COL1)
#[1] 54345 65231 76234
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
37

The reason why the gsub didn't work was there was , in the column, which is still non-numeric. So when convert to 'numeric' with as.numeric, all the non-numeric elements are converted to NA. So, we need to remove both , and $ to make it work.

df1$COL1 <- as.numeric(gsub('[$,]', '', df1$COL1))

We match the $ and , inside the square brackets ([$,]) so that it will be considered as that character ($ left alone has special meaning i.e. it signifies the end of the string.) and replace it with ''.

Or we can escape (\\) the character ($) to match it and replace by ''.

df1$COL1 <- as.numeric(gsub('\\$|,', '', df1$COL1))
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 1
    This works great with positive numbers but does not seem to work for negative currency. For example, after I run the code, all positives work but negative like this, ($100,00.04), does not. – Alex S. Sandoval Oct 20 '20 at 17:34
  • @AlexS.Sandoval is there a `-` sign before that? – akrun Oct 20 '20 at 21:10
  • No. the source of the data is formatted to show negative currency in parenthesis. – Alex S. Sandoval Oct 21 '20 at 15:29
  • 2
    @AlexS.Sandoval regex is based on patterns. Here, the parenthesis is not included. If you include that it should work – akrun Oct 21 '20 at 19:22
  • 2
    To take care of negative currency represented by enclosing parentheses try this before the call to as.numeric: `df1$COL1 <- as.numeric(gsub('[$,)]', '', df1$COL1)); df1$COL1 <- sub('[(]', '-', df1$COL1` – MCornejo Oct 23 '20 at 18:00
1

Another option using stringr library to remove '$' and ',' then convert as follows:

df %>% mutate(COL1 = COL1 %>% str_remove_all("\\$,") %>% as.numeric())
charlehl
  • 11
  • 1
0

Nested gsub to handle negatives and transform to make it functional and to take advantage of NSE

transform(df, COL1 = as.numeric(gsub("[$),]", "", gsub("^\\(", "-", COL1))))
hello_friend
  • 5,682
  • 1
  • 11
  • 15