0

I am trying to improve the readability of automated text generation based on a database query.

is there a neat way to perform these substitutions ? To do the following in 1 command instead of 6?

x<-c("Te( )st", "Test()", "Test ()", "Test ( )", "Test ,,", "Test,, ", "Test , ")
out<-c("Test", "Test", "Test", "Test", "Test,", "Test, ", "Test,") 

x<-gsub(pattern = "( ", replacement = "(", x, fixed = T)
x<-gsub(pattern = " )", replacement = ")", x, fixed = T)
x<-gsub(pattern = " ,", replacement = ",", x, fixed = T)
x<-gsub(pattern = "()", replacement = "", x, fixed = T)
x<-gsub(pattern = ",,", replacement = ",", x, fixed = T)
x<-gsub(pattern = " ,", replacement = ",", x, fixed = T)
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
ECII
  • 10,297
  • 18
  • 80
  • 121

3 Answers3

3

You can use

x<-c("Te( )st", "Test()", "Test ()", "Test ( )", "Test ,,", "Test,, ", "Test , ")
gsub("\\(\\s*\\)|\\s+(?=[,)])|(?<=\\()\\s+|(,),+", "\\1", x, perl=TRUE)
# => [1] "Test"   "Test"   "Test "  "Test "  "Test,"  "Test, " "Test, "

See the R demo online and the regex demo. Details:

  • \(\s*\)| - (, zero or more whitespaces and then a ), or
  • \s+(?=[,)])| - one or more whitespaces and then either , or ), or
  • (?<=\()\s+| - one or more whitespaces immediately preceded with a ( char, or
  • (,),+ - a comma captured into Group 1 and then one or more commas.

The replacement is the Group 1 value, namely, if Group 1 matched, the replacement is a single comma, else, it is an empty string.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
1

You can use mgsub::mgsub.

a = c("( ", " )", " ,", "()",",,") #pattern
b = c("(", ")", ",", "",",")       #replacement
x<-c("Te( )st", "Test()", "Test ()", "Test ( )", "Test ,,", "Test,, ", "Test , ")

mgsub::mgsub(x, a, b, fixed = T)
#[1] "Te()st"  "Test"    "Test "   "Test ()" "Test,,"  "Test, "  "Test, " 

You might want to add other patterns to get the output you want.

Maël
  • 45,206
  • 3
  • 29
  • 67
0

You can use multigsub function which is a wrapper of gsub function in R. You can find the documentation here.

Here's the code:

multigsub(c("(", ")", ",", "()", ",,", " ,"), c("(", ")", ",", "", ",", ","), x, fixed = T)
Vishal A.
  • 1,373
  • 8
  • 19