2

I want to add a space between two punctuation characters (+ and -). I have this code:

s <- "-+"
str_replace(s, "([:punct:])([:punct:])", "\\1\\s\\2")

It does not work. May I have some help?

JeffC
  • 22,180
  • 5
  • 32
  • 55
Lilly
  • 39
  • 2
  • I don't know about R, but with PCRE, for example, you would want `([[:punct:]])([[:punct:]])`. Another way, again with PCRE, would be to replace the (zero-width) match of `(?<=[[:punct:]])(?=[[:punct:]])` with a space (`(?<=[[:punct:]])` being a *negative lookbehind* and `(?=[[:punct:]])` being a *negative lookahead*). [Demo](https://regex101.com/r/N1JmjC/1). – Cary Swoveland Feb 16 '23 at 06:20
  • With `stringr` and `stringi`, some punctuations(e.g. `'+'`) cannot be matched by `[:punct:]`. But `sub` from `base` R handles it well. Try `gsub("([[:punct:]])([[:punct:]])", "\\1 \\2", s)` – Darren Tsai Feb 16 '23 at 06:43
  • 1
    See https://stackoverflow.com/q/26348643/10068985 – Darren Tsai Feb 16 '23 at 06:44
  • `str_replace` uses the ICU engine which is different from the PCRE engine used by base R. `[[:punct:]]` is a PCRE support metaclass, and not ICU – Onyambu Feb 16 '23 at 08:28

1 Answers1

0

There are several issues here:

  • [:punct:] pattern in an ICU regex flavor does not match math symbols (\p{S}), it only matches punctuation proper (\p{P}), if you still want to match all of them, combine the two classes, [\p{P}\p{S}]
  • "\\1\\s\\2" replacement contains a \s regex escape sequence, and these are not supported in the replacement patterns, you need to use a literal space
  • str_replace only replaces one, first occurrence, use str_replace_all to handle all matches
  • Even if you use all the above suggestions, it still won't work for strings like -+?/. You need to make the second part of the regex a zero-width assertion, a positive lookahead, in order not to consume the second punctuation.

So, you can use

library(stringr)
s <- "-+?="
str_replace_all(s, "([\\p{P}\\p{S}])(?=[\\p{P}\\p{S}])", "\\1 ")
str_replace_all(s, "(?<=[\\p{P}\\p{S}])(?=[\\p{P}\\p{S}])", " ")
gsub("(?<=[[:punct:]])(?=[[:punct:]])", " ", s, perl=TRUE)

See the R demo online, all three lines yield [1] "- + ? =" output.

Note that in PCRE regex flavor (used with gsub and per=TRUE) the POSIX character class must be put inside a bracket expression, hence the use of double brackets in [[:punct:]].

Also, (?<=[[:punct:]]) is a positive lookbehind that checks for the presence of its pattern immediately on the left, and since it is non-consuming there is no need of any backreference in the replacement.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563