0

I am trying to clean my data from a data frame's cells. I want to remove some strings, but gsub somehow omits "()". My code:

getridof <- c("(a)", "(40X)", "(5X)", "(10X_a)", "(10X)", "(_)")

for (i in 1:length(getridof)) {
  df2$Sample <- gsub(getridof[i], "", df2$Sample)  
}

but "()" is left in cells after executing the script?

mschmidt
  • 89
  • 7
  • have a look at this answer: https://stackoverflow.com/a/49681981/8689518 maybe it helps? – RamsesII Feb 25 '22 at 10:27
  • Does this answer your question? [Removing/replacing brackets from R string using gsub](https://stackoverflow.com/questions/49681952/removing-replacing-brackets-from-r-string-using-gsub) – Maël Feb 25 '22 at 11:16

4 Answers4

1

A possible solution, but I am not sure whether you only want to remove parentheses:

library(tidyverse)

getridof <- c("(a)", "(40X)", "(5X)", "(10X_a)", "(10X)", "(_)")

getridof %>% 
  str_remove("^\\(") %>% 
  str_remove("\\)$") 

#> [1] "a"     "40X"   "5X"    "10X_a" "10X"   "_"

Taking the alternative interpretation of your question:

library(tidyverse)

getridof <- c("(a)", "(40X)", "(5X)", "(10X_a)", "(10X)", "(_)")
data <- c("(a)100", "(40X)33", "nothing", "zzzz(5X)", "22(10X_a)44", "yyy(10X)", "aa(_)b")

getridof <- getridof %>% 
  str_replace("\\(", "\\\\(") %>% 
  str_replace("\\)", "\\\\)") %>% 
  str_c(collapse = "|")
  
str_replace_all(data, getridof, "")

#> [1] "100"     "33"      "nothing" "zzzz"    "2244"    "yyy"     "aab"
PaulS
  • 21,159
  • 2
  • 9
  • 26
  • 1
    Thanks for the fast reply! I want the whole strings removed with "(" and ")" as well, not just the brackets. But your solution will be useful for me in some other cases. However, I don't get the tidyverse syntax – mschmidt Feb 25 '22 at 10:34
  • If you add a minimal example of your `df2$Sample`, we will be able to offer you a full solution, @mschmidt. – PaulS Feb 25 '22 at 10:36
1

This uses reduce and the fixed = TRUE argument of gsub:

library(purrr)
data <- c("(a)100", "(40X)33", "nothing")

getridof <- c("(a)", "(40X)", "(5X)", "(10X_a)", "(10X)", "(_)")

purrr::reduce(getridof,
              ~gsub(.y, "", .x, fixed = TRUE),
              .init = data)

# [1] "100"     "33"      "nothing" 

The purrr::reduce function is meant to replace your for loop. It recursively delete each of the unwanted strings from data.

Stefano Barbi
  • 2,978
  • 1
  • 12
  • 11
1

Using gsub:

gsub("[()]", "", getridof)

[1] "a"     "40X"   "5X"    "10X_a" "10X"   "_"  

Using stringr:

library(stringr)
str_remove_all(getridof, "[()]")

[1] "a"     "40X"   "5X"    "10X_a" "10X"   "_"
bird
  • 2,938
  • 1
  • 6
  • 27
0

adding argument fixed = TRUE did the job

df2$Sample <- gsub(getridof[i], "", df2$Sample, fixed = TRUE)
mschmidt
  • 89
  • 7