-2

I noticed a different behaviour between stringr::str_replace and gsub.

Sample data:

dat <- structure(list(country_name = c("Burkina", "Burkina", "Burkina", 
"Burkina", "Burkina", "Burkina"), region_name = c("BOUCLE DU MOUHOUN_NA", 
"BOUCLE DU MOUHOUN_NA", "BOUCLE DU MOUHOUN_NA", "BOUCLE DU MOUHOUN_NA", 
"BOUCLE DU MOUHOUN_NA", "BOUCLE DU MOUHOUN_NA"), lat = c("NA_NA", 
"NA_NA", "NA_NA", "NA_NA", "NA_NA", "NA_NA"), lon = c("NA_NA", 
"NA_NA", "NA_NA", "NA_NA", "NA_NA", "NA_NA"), farm_size_ha = c("3_NA", 
"5.67_NA", "8_NA", "46_NA", "29.5_NA", "20_NA"), plot_number = c("4_NA_NA", 
"5_NA_NA", "3_NA_NA", "3_NA_NA", "9_NA_NA", "5_NA_NA")), row.names = c(NA, 
6L), class = "data.frame")

Using str_replace:

dat %>%   mutate_all(~str_replace(., pattern = 'NA_|_NA', replacement = '')) 

Gives:

   country_name       region_name lat lon farm_size_ha plot_number 
 Burkina BOUCLE DU MOUHOUN  NA  NA            3        4_NA    
 Burkina BOUCLE DU MOUHOUN  NA  NA         5.67        5_NA    
 Burkina BOUCLE DU MOUHOUN  NA  NA            8        3_NA  
 Burkina BOUCLE DU MOUHOUN  NA  NA           46        3_NA   
 Burkina BOUCLE DU MOUHOUN  NA  NA         29.5        9_NA   
 Burkina BOUCLE DU MOUHOUN  NA  NA           20        5_NA

Using gsub:

dat %>%  mutate_all(~gsub(pattern = 'NA_|_NA', replacement = '', .)) 

Gives the desired output:

country_name       region_name lat lon farm_size_ha plot_number
Burkina BOUCLE DU MOUHOUN  NA  NA            3           4
Burkina BOUCLE DU MOUHOUN  NA  NA         5.67           5
Burkina BOUCLE DU MOUHOUN  NA  NA            8           3
Burkina BOUCLE DU MOUHOUN  NA  NA           46           3
Burkina BOUCLE DU MOUHOUN  NA  NA         29.5           9
Burkina BOUCLE DU MOUHOUN  NA  NA           20           5

Could you explain me why are they behaving differently? And How to force str_replace to behave just like gsub?

tom
  • 725
  • 4
  • 17

1 Answers1

1

str_replace (like sub) removes only first occurrence, if you need to remove all occurrences you can use str_replace_all (like gsub).

library(dplyr)
library(stringr)

dat %>% mutate(across(.fns = ~str_replace_all(., pattern = 'NA_|_NA', replacement = ''))) 

Or use str_remove_all as mentioned by @csgroen which is shorthand for str_replace_all with replacement as "".

dat %>% mutate(across(.fns = ~str_remove_all(., pattern = 'NA_|_NA'))) 


#  country_name       region_name lat lon farm_size_ha plot_number
#1      Burkina BOUCLE DU MOUHOUN  NA  NA            3           4
#2      Burkina BOUCLE DU MOUHOUN  NA  NA         5.67           5
#3      Burkina BOUCLE DU MOUHOUN  NA  NA            8           3
#4      Burkina BOUCLE DU MOUHOUN  NA  NA           46           3
#5      Burkina BOUCLE DU MOUHOUN  NA  NA         29.5           9
#6      Burkina BOUCLE DU MOUHOUN  NA  NA           20           5
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • Thanks a lot, I was not aware of the _all suffix for str_replace, neither of str_remove. that's very good to know! (I dont really get the downvote) – tom Jul 17 '20 at 08:42