5

I am trying to replace commas bounded by nonwhite space with a white space, while keeping other commas untouched (in R).

Imagine I have:

j<-"Abc,Abc, and c"

and I want:

"Abc Abc, and c"

This almost works:

gsub("[^ ],[^ ]"," " ,j)

But it removes the characters either side of the commas to give:

"Ab bc, and c"
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
tsutsume
  • 95
  • 5

4 Answers4

5

You may use a PCRE regex with a negative lookbehind and lookahead:

j <- "Abc,Abc, and c"
gsub("(?<!\\s),(?!\\s)", " ", j, perl = TRUE)
## => [1] "Abc Abc, and c"

See the regex demo

Details:

  • (?<!\\s) - there cannot be a whitespace right before a ,
  • , - a literal ,
  • (?!\\s) - there cannot be a whitespace right after a ,

An alternative solution is to match a , that is enclosed with word boundaries:

j <- "Abc,Abc, and c"
gsub("\\b,\\b", " ", j)
## => [1] "Abc Abc, and c"

See another R demo.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • Is this functionally equivalent: `"(?<=\\S),(?=\\S)"`? – nrussell Mar 01 '17 at 12:44
  • 1
    No, negative lookarounds are not equivalent to positive ones as positive lookarounds require the presence of the pattern. Usually, the difference is seen at start/end of string positions. `(?<=\S)` requires a non-whitespace before the next subpattern, thus, there will be no match at the start of the string. `(?<!\s)` means there cannot be a whitespace before, but the start of string can be there. – Wiktor Stribiżew Mar 01 '17 at 12:48
3

You can use back references like this:

gsub("([^ ]),([^ ])","\\1 \\2" ,j)
[1] "Abc Abc, and c"

The () in the regular expression capture the characters adjacent to the comma. The \\1 and \\2 return these captured values in the order they were captured.

lmo
  • 37,904
  • 9
  • 56
  • 69
3

We can try

gsub(",(?=[^ ])", " ", j, perl = TRUE)
#[1] "Abc Abc, and c"
akrun
  • 874,273
  • 37
  • 540
  • 662
0

Maybe it also works:

library("stringr")
j<-"Abc,Abc, and c"
str_replace(j,"(\\w+),([\\w]+)","\\1 \\2")
Vida Wang
  • 406
  • 2
  • 7