2

I want to string replace special characters such as ® and °F in all the columns of my dataframe to html code.

special_char <- function(df) {
  df %>%
    mutate_all(.funs = ~ str_replace_all(.x, pattern = "®", replacement = "&reg;"))
}

However, this code does not replace ® to &reg; as I want in the columns. Instead, ® remains as if the pattern is undetected.

Chris
  • 125
  • 7

1 Answers1

2

If you only have a few specific symbols to change, it would be easiest to use their Unicode code points. For example, to change all occurences of the registered trademark symbol (Unicode +U00AE) to the equivalent html entity (&reg;), and any degree symbols (+U00B0) to the entity &deg;, we can do:

special_char <- function(df) {
  
    mutate_all(df, .funs = ~ str_replace_all(.x, 
                                             c("\u00ae", "\u00b0"),
                                             c("&reg;",  "&deg;")))
}

So, if your data frame looks like this:

data <- data.frame(a = c("Stack Overflow®", "451°F"),
                   b = c("Coca Cola®", "22°F"))
#>                 a          b
#> 1 Stack Overflow® Coca Cola®
#> 2           451°F       22°F

Your function will escape all relevant instances:

data %>% special_char()
#>                     a              b
#> 1 Stack Overflow&reg; Coca Cola&reg;
#> 2           451&deg;F       22&deg;F

If you want all non-ASCII characters encoded to html entities, a more general solution would be to use the numerical entity format. This is less human-readable, but probably the go-to option if you have a lot of different symbols to escape. A useful starting point would be Mr Flick's solution here, though you would need to vectorize this function to get it working with data frame columns.

Allan Cameron
  • 147,086
  • 7
  • 49
  • 87