0

I have a df with several columns that have dollar values preceded by the "$" like so:

> str(data)
Classes ‘data.table’ and 'data.frame':  196879 obs. of  32 variables:
 $ City             : chr  "" "" "" "" ...
 $ Company_Goal     : chr  "" "" "" "" ...
 $ Company_Name     : chr  "" "" "" "" ...
 $ Event_Date       : chr  "5/14/2016" "9/26/2015" "9/12/2015" "6/3/2017" ...
 $ Event_Year       : chr  "FY 2016" "FY 2016" "FY 2016" "FY 2017" ...
 $ Fundraising_Goal : chr  "$250" "$200" "$350" "$0" ...
 $ Name             : chr  "Heart Walk 2015-2016 St. Louis MO" "Heart Walk 2015-2016 Canton, OH" "Heart Walk 2015-2016 Dallas, TX" "FDA HW 2016-2017 Albany, NY WO-65355" ...
 $ Participant_Id   : chr  "2323216" "2273391" "2419569" "4088558" ...
 $ State            : chr  "" "OH" "TX" "" ...
 $ Street           : chr  "" "" "" "" ...
 $ Team_Average     : chr  "$176" "$123" "$306" "$47" ...
 $ Team_Captain     : chr  "No" "No" "Yes" "No" ...
 $ Team_Count       : chr  "7" "6" "4" "46" ...
 $ Team_Id          : chr  "152788" "127127" "45273" "179207" ...
 $ Team_Member_Goal : chr  "$0" "$0" "$0" "$0" ...
 $ Team_Name        : chr  "Team Clayton" "Cardiac Crusaders" "BIS - Team Myers" "Independent Walkers" ...
 $ Team_Total_Gifts : chr  "$1,230 " "$738" "$1,225 " "$2,145 " ...
 $ Zip              : chr  "" "" "" "" ...
 $ Gifts_Count      : chr  "2" "1" "2" "1" ...
 $ Registration_Gift: chr  "No" "No" "No" "No" ...
 $ Participant_Gifts: chr  "$236" "$218" "$225" "$0" ...
 $ Personal_Gift    : chr  "$0" "$0" "$0" "$250" ...
 $ Total_Gifts      : chr  "$236" "$218" "$225" "$250" ...
 $ MATCH_CODE       : chr  "UX000" "UX000" "UX000" "UX000" ...
 $ TAP_LEVEL        : chr  "X" "X" "X" "X" ...
 $ TAP_DESC         : chr  "" "" "" "" ...
 $ TAP_LIFED        : chr  "" "" "" "" ...
 $ MEDAGE_CY        : chr  "0" "0" "0" "0" ...
 $ DIVINDX_CY       : chr  "0" "0" "0" "0" ...
 $ MEDHINC_CY       : chr  "0" "0" "0" "0" ...
 $ MEDDI_CY         : chr  "0" "0" "0" "0" ...
 $ MEDNW_CY         : chr  "0" "0" "0" "0" ...
 - attr(*, ".internal.selfref")=<externalptr> 

I am trying to remove all of the "$". I have been unable to do so- I have tried the suggestions provided in this post as well as this one but in both situations- the data remains unchanged...

Help?

zsad512
  • 861
  • 3
  • 15
  • 41
  • 1
    The second answer you linked to should work if you use `\\$` as the pattern (or use the `fixed` argument of `gsub()` – Adam Spannbauer Jan 20 '18 at 14:52
  • You need to understand the difference between metacharacter and non-metacharacter before complaining and then downvoting the answer – akrun Jan 20 '18 at 16:00

2 Answers2

3

The dollar sign is a reserved character in regular expressions (see here for more info). The gsub() function assumes the pattern is a regex by default.

You have to escape the dollar sign using backslashes (\\$) to match a literal $.

#sample data
df = data.frame(Team_Average = c("$176", "$123", "$306"),
                Name = c("Heart Walk 2015-2016 St. Louis MO", 
                         "Heart Walk 2015-2016 Canton, OH",
                         "Heart Walk 2015-2016 Dallas, TX"),
                stringsAsFactors = FALSE)

df[] = lapply(df, gsub, pattern="\\$", replacement="")

Alternatively you can use gsub's option of fixed=TRUE to match the pattern literally.

df[] = lapply(df, gsub, pattern="$", replcement="", fixed=TRUE)
Adam Spannbauer
  • 2,707
  • 1
  • 17
  • 27
  • 1
    I suggest you start with `ind <- sapply(df, is.character)`, and then `df[ind] <- lapply(df[ind], ...)`. Even if this data.frame is all strings, if there are any non-`character` columns (including `logical`, `integer`, `numeric`, and even `factor`), they are silently converted to `character`. One could argue `factor`s should also be `gsub`ed, but since they are not preserved as factors, this is not desired behavior. – r2evans Jan 20 '18 at 17:45
2

The other answers work nicely on the example provided. However, if the data set contained any numeric columns, then running gsub() or stringr::str_replace_all() via lapply() would coerece numeric columns to character:

library(stringr)
library(dplyr)

d <- data_frame(
  x = c("$200", "$191.40", "80.12"),
  y = c("$test", "column", "$foo"),
  z = 1:3
)

d[] <- lapply(d, gsub, pattern = "\\$", replacement = "")

# A tibble: 3 x 3
  x      y      z    
  <chr>  <chr>  <chr>
1 200    test   1    
2 191.40 column 2    
3 80.12  foo    3 

Note the class of z above.

Here is a tidyverse approach to removing $ from all character columns:

d %>% 
  mutate_if(
    is.character,
    funs(str_replace_all(., "\\$", ""))
    )

# A tibble: 3 x 3
  x      y          z
  <chr>  <chr>  <int>
1 200    test       1
2 191.40 column     2
3 80.12  foo        3
davechilders
  • 8,693
  • 2
  • 18
  • 18