0

I am trying to write a function which uses gsub to find and replace certain strings. I have 10+ excel spreadsheets with a list of monetary values using standard Swedish separators for thousands and decimals, i.e. a decimal is noted by a comma and not a period, and the thousands separator is a space and not a comma. What in English would be 1,000.31 becomes 1 000,31 in Swedish? I am trying to write a function that will eliminate the space where the thousands separator is, and find the comma and replace it with a period.

Normal gsub functions work fine:

    df$Saldo <- gsub(",", ".", df$Saldo)
    df$Saldo <- gsub(" ", "", df$Saldo)

However, I will need to do this on 10+ dataframes, so I thought it would be more efficient to write a function that could be applied to each dataframe (The function I have tried to write plus a reproducible example are provided below). This is where I run into problems. I should also mention I am looking for a dplyr solution. The error I get with the function (below) is Error: Quosures can only be unquoted within a quasiquotation context.

I'll admit that, despite having read and reread about non-standard evaluation and quasi-quotation, I'm still having difficulty understanding these concepts. I have read Programming with dplyr

https://cran.r-project.org/web/packages/dplyr/vignettes/programming.html.

I have also read:

https://rlang.r-lib.org/reference/quasiquotation.html.

Finally, I have studied several vaguely similar problems/solutions on stack overflow, but the solutions there don’t seem to work for me. Most of the problems deal with getting the regular expression to work, and not getting regular expressions to work inside a function.

Here is the function, such as it is:

     library(dplyr)
     replace_commas <- function(df1, c_name) {
      c_name <- enquo(c_name)
      df1 <- df1 %>% gsub(",", ".",!! c_name)
      return(df1)
      }

And here is a reproducible example:

    df <- data.frame(Date = c("2018-12-15", "2018-11-04", "2018-10-26"), Saldo = c("123 432,45", "87 546,23", "9 564,82"))

    df_new <- replace_commas(df, Saldo)

The immediate thing I want to achieve is a function that finds and replaces commas with periods. However, I would also appreciate a pointer on how to also include a regular expression in the function for removing unneeded spaces.

BPDESILVA
  • 2,040
  • 5
  • 15
  • 35
Kuns
  • 25
  • 3
  • I see what you want now, it has little to do with gsub and Swedish number formatting, and more with quote and quasi quotation. I suggest you change the title of the question to better reflect that, and drawing the attention of the people who understand it well. Removing all my other comments ;) – PavoDive Jun 16 '19 at 10:37
  • To remove unwanted spaces. Find a space and replace it with nothing `library(stringr); x <- c(' foo ', 'bar '); str_replace_all(x, "[ ]", "")` – Tony Ladson Jun 16 '19 at 10:37
  • Thanks @PavoDive. I have changed the title per your recommendation. – Kuns Jun 16 '19 at 11:25
  • `readxl` can probably do right from the importation what you're trying to do afterwards – moodymudskipper Jun 21 '19 at 18:12

1 Answers1

0

The following version of your code works to remove commas and replace them with decimals and also to remove spaces but you might want to use R's builtin facilities instead as shown further down.

library(dplyr)

as.swedish <- function(from) as.numeric(gsub(" ", "", sub(",", ".", from)))
replace_commas <- function(df1, c_name) {
  c_name <- enquo(c_name)
  df1 %>% mutate_at(vars(!!c_name), as.swedish)
}   
df %>% replace_commas(Saldo)

The following assume that there are spaces as thousands separators and commas as decimals except 1a which only assumes commas as decimals.

1) read.csv2 We can define a swedish class and then use it in read.csv2. This uses the one-line function as.swedish defined above.

# test data
Lines <- "Letter;Number\nA;1 200,3\nB;32\nC;2511,01"

# define swedish class
setClass("swedish")
setAs("character", "swedish", as.swedish)
setAs("factor", "swedish", as.swedish)

read.csv2(text = Lines, colClasses = list(Number = "swedish"))

giving:

  Letter  Number
1      A 1200.30
2      B   32.00
3      C 2511.01

1a) If we don't have spaces as thousands separators and only have commas as decimals then we can use read.csv2:

Lines2 <- "Letter;Number\nA;1200,3\nB;32\nC;2511,01"
read.csv2(text = Lines2)

giving:

  Letter  Number
1      A 1200.30
2      B   32.00
3      C 2511.01

2) convert df or using swedish class from above to convert Saldo in df:

transform(df, Saldo = as(Saldo, "swedish"))

giving:

        Date     Saldo
1 2018-12-15 123432.45
2 2018-11-04  87546.23
3 2018-10-26   9564.82

2a) This also works and only uses only the as.swedish function from above but not the swedish class.

transform(df, Saldo = as.swedish(Saldo))

or almost the same with dplyr df %>% mutate(Saldo = as.swedish(Saldo))

3) function to convert df or define this function where names is a character vector of column names to convert from Swedish. Neither quasi-quotation nor corresponding builtin R facilities are needed if we pass the names as character strings which seems like a better design anyways. This uses the as.swedish function from above but not the swedish class.

swedish <- function(data, names) replace(data, names, lapply(data[names], as.swedish))
swedish(df, "Saldo")

or

library(dplyr)
df %>% swedish("Saldo")
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
  • Wow! Thank you for your very thorough answer! I have tried all the solutions, and they all worked of course. I will have to study this to make sure I understand all the solutions. For example, I'm not familiar with setClass and setAs. – Kuns Jun 16 '19 at 19:58