2

I have a dataframe that looks like this:

df <-
  structure(
    list(
      Exception1 = c(
        "Comments from {2}: {0}",
        "status updated to {1} by {2}. Description:{0}",
        "status updated to {1} by {2}. Description:{0}",
        "information only.",
        "status updated to {1} by {2}. Description:{0}",
        "status updated to {1} by {2}. Description:{0}"
      ),
      Exception2 = c(
        "Customer {0} said bla",
        "Status updated to {1}",
        "Customer said {2}",
        "User {0} foo",
        "{0} {1}",
        "{1} {2}"
      ),
      ARGUMENT1 = c("OK", " ", " ", "PAY9089723089-98391", " ", " "),
      ARGUMENT2 = c(
        "null",
        "Processing",
        "Reconciled",
        "null",
        "Processing",
        "Reconciled"
      ),
      ARGUMENT3 = c(
        "company name",
        "company name",
        "company name",
        "null",
        "company name",
        "company name"
      )
    ),
    row.names = c(NA, 6L),
    class = "data.frame"
  )

:

| Exception1                                    | Exception2            | ARGUMENT1           | ARGUMENT2  | ARGUMENT3    |
|-----------------------------------------------|-----------------------|---------------------|------------|--------------|
| Comments from {2}: {0}                        | Customer {0} said bla | OK                  | null       | company name |
| status updated to {1} by {2}. Description:{0} | Status updated to {1} |                     | Processing | company name |
| status updated to {1} by {2}. Description:{0} | Customer said {2}     |                     | Reconciled | company name |
| information only.                             | User {0} foo          | PAY9089723089-98391 | null       | null         |
| status updated to {1} by {2}. Description:{0} | {0} {1}               |                     | Processing | company name |
| status updated to {1} by {2}. Description:{0} | {1} {2}               |                     | Reconciled | company name |

The Exception1 and Exception 2 columns (there are a couple more Exception columns which I removed for readability) contain placeholders {} that are meant to be replaced with the values in the ARGUMENT* columns.

I have been looking at ways to accomplish this and have been relatively successful but I am still lacking the experience to do it better.

I wrote a simple function that does the replace through gsub:

excp_ren2 <- function(x) {
  x %<>%
    gsub("\\{1\\}", x["ARGUMENT2"], .) %>%
    gsub("\\{0\\}", x["ARGUMENT1"], .) %>%
    gsub("\\{2\\}", x["ARGUMENT3"], .)
  x
}

And then have been using apply and its variances. I have accomplished for example an OK result with this:

new_df <-
  df %>% apply(
    .,
    MARGIN = 1,
    FUN = function(x)
      excp_ren2(x)
  ) %>% as.data.frame()

with the only problem that this transposes the matrix which is not really a problem.

I am looking for better ways to do this, I thought I was going to be able to do this through mutate_* but I think I lose access to the column names for the row inside the function or at least I do not know how to do it. Any ideas on simpler ways to accomplish this?

Thanks!

  • 2
    You are using `apply` with `MARGIN = 1` on all the columns. I guess you are interested only in the firsst column `Exception1 `) In that case, just apply the function on that column – akrun Dec 27 '19 at 21:22

3 Answers3

2

Instead of doing this by row (and applying the function on each column rather than 'Exception1'), we can use str_replace (which is vectorized) in a pipe

library(stringr)
library(dplyr)
df %>%
  transmute(new =  str_replace_all(Exception1, "\\{1\\}", ARGUMENT2) %>% 
                   str_replace_all("\\{0\\}", ARGUMENT1) %>% 
                   str_replace_all("\\{2\\}", ARGUMENT3))
#                                                  new
#1                                  Comments from company name: OK
#2 status updated to Processing by company name. \\nDescription:\n
#3 status updated to Reconciled by company name. \\nDescription:\n
#4                  PCard order invoices are for information only.
#5 status updated to Processing by company name. \\nDescription:\n
#6 status updated to Reconciled by company name. \\nDescription:\n

If we have multiple columns, we can make use of mutate_at or transmute_at

df %>%
   transmute_at(vars(starts_with("Exception")), ~ 
           str_replace_all(., "\\{1\\}", ARGUMENT2) %>% 
                   str_replace_all("\\{0\\}", ARGUMENT1) %>% 
                   str_replace_all("\\{2\\}", ARGUMENT3))
#                    Exception1                   Exception2
#1                              Comments from company name: OK         Customer OK said bla
#2 status updated to Processing by company name. Description:  Status updated to Processing
#3 status updated to Reconciled by company name. Description:    Customer said company name
#4                                           information only. User PAY9089723089-98391 foo
#5 status updated to Processing by company name. Description:                    Processing
#6 status updated to Reconciled by company name. Description:       Reconciled company name
akrun
  • 874,273
  • 37
  • 540
  • 662
1

Maybe something like this

clean_pipe <- . %>% 
  mutate(new_string = Exception1 %>% str_replace_all(pattern = "\\{0\\}",replacement = ARGUMENT1)) %>% 
  mutate(new_string = new_string %>% str_replace_all(pattern = "\\{1\\}",replacement = ARGUMENT2)) %>% 
  mutate(new_string = new_string %>% str_replace_all(pattern = "\\{2\\}",replacement = ARGUMENT3))

df %>% 
  clean_pipe
Bruno
  • 4,109
  • 1
  • 9
  • 27
1

The way you used { } to demarcate made me think of using glue, which operates in a similar way. To make glue templates that match the column names in your data, first use a named list in stringr::str_replace_all to match patterns with replacements all in one step. Then create the glue objects from the "Exception*" columns. Based on this post (R dplyr: rowwise + mutate (+glue) - how to get/refer row content?), you'll need to use rowwise because otherwise it will try to use all the values of each argument column for each template. I'd wanted to put both mutate_at steps into one function, but had some trouble with scoping, so this was the neatest I could get working.

library(dplyr)
library(tidyr)

replacements <- c("\\{1\\}" = "{ARGUMENT2}",
                  "\\{0\\}" = "{ARGUMENT1}",
                  "\\{2\\}" = "{ARGUMENT3}")

as_tibble(df) %>%
  rowwise() %>%
  mutate_at(vars(starts_with("Exception")), stringr::str_replace_all, replacements) %>%
  mutate_at(vars(starts_with("Exception")), ~as.character(glue::glue(.)))
#> Source: local data frame [6 x 5]
#> Groups: <by row>
#> 
#> # A tibble: 6 x 5
#>   Exception1                  Exception2       ARGUMENT1     ARGUMENT2 ARGUMENT3
#>   <chr>                       <chr>            <chr>         <chr>     <chr>    
#> 1 Comments from company name… Customer OK sai… OK            null      company …
#> 2 "status updated to Process… Status updated … " "           Processi… company …
#> 3 "status updated to Reconci… Customer said c… " "           Reconcil… company …
#> 4 information only.           User PAY9089723… PAY908972308… null      null     
#> 5 "status updated to Process… "  Processing"   " "           Processi… company …
#> 6 "status updated to Reconci… Reconciled comp… " "           Reconcil… company …

Note that because some strings are empty, you've got extra blank spaces in your result that you can trim with trimws.

camille
  • 16,432
  • 18
  • 38
  • 60