4

I have a string 3.4(2.5-4.7), I want to insert a white space before the open bracket "(" so that the string becomes 3.4 (2.5-4.7).

Any idea how this could be done in R?

Ric S
  • 9,073
  • 3
  • 25
  • 51
Patrick
  • 1,057
  • 9
  • 23

3 Answers3

3
x <- "3.4(2.5-4.7)"
sub("(.*)(?=\\()", "\\1 ", x, perl = T)
[1] "3.4 (2.5-4.7)"

This regex is based on lookahead: it creates one capturing group subsuming everything up until the lookahead, namely, the opening parenthesis (?=\\()), recalls it and inserts one whitespace after it in the replacement argument to sub (which is enough unless you have more than one such substitution per string, in which case gsubis needed). The argument perl = Tneeds to be added to enable the lookahead.

EDIT:

If you have a string like this:

x <- "3.4(2.5to4.7)"

the regex gets slightly more complex; the underlying idea though remains the same: you divide the string into different captruing groups (...), which you then recall using appropriate backreference in the replacement argument while adding the sought spaces:

sub("(.*)(\\(\\d+\\.\\d+)(to)(\\d+\\.\\d+\\))", "\\1 \\2 \\3 \\4", x)
[1] "3.4 (2.5 to 4.7)"

EDIT2:

x <- '3.4(2.5,4.7)'
sub("(.*)(\\(\\d+\\.\\d+)(,)(\\d+\\.\\d+\\))", "\\1 \\2\\3 \\4", x)
[1] "3.4 (2.5, 4.7)"

EDIT3:

x <- '3(2,4)'
sub("(.*)(\\(\\d+)(,)(\\d+)", "\\1 \\2\\3 \\4", x)
Chris Ruehlemann
  • 20,321
  • 4
  • 12
  • 34
  • 1
    I've edited my answer to address all variants you have mentioned in the comments. Does this help? – Chris Ruehlemann Jul 31 '20 at 08:20
  • Thank you for your edit. Your EDIT and EDIT2 worked well for `x <- '3.4(2.5,4.7)'`, but they did not work when `x <- '3(2,4)'`. It seems that numbers within parenthesis are required to have at least 1 digit. Is there any solution when numbers are integers? – Patrick Jul 31 '20 at 08:42
  • I tried sub("(.*)(\\(\\d+\\.*\\d*)(to)(\\d+\\.*\\d+\\))", "\\1 \\2 \\3 \\4", a) and it seemed to work. Substituting + with * allows for matching at least 0 times rather than at least 1 time. Please let me know if I am anywhere wrong. Thx. – Patrick Jul 31 '20 at 08:56
  • Please see new edit. What exactly is your question regarding this code of yours: `sub("(.*)(\(\\d+\\.*\\d*)(to)(\\d+\\.*\\d+\))", "\\1 \\2 \\3 \\4", a)`? Another thing: which of the answers **really**, or at least **most closely**, answers your question? – Chris Ruehlemann Jul 31 '20 at 09:59
  • I tried `sub("(.*)(\(\\d+\\.*\\d*)(to)(\\d+\\.*\\d+\))", "\\1 \\2 \\3 \\4", a)` to solve the issue `x <- '3(2to4)'` cannot be adequately separated by space following the EDIT. Your EDIT3 solved this issue, while `sub("(.*)(\(\\d+\\.*\\d*)(to)(\\d+\\.*\\d+\))", "\\1 \\2 \\3 \\4", a)` may be directly used for both integers and numbers with digits. Sorry for my ambiguity. Hope this clarifies. – Patrick Jul 31 '20 at 10:25
2

Try

gsub('(.*)(\\(.*\\))', '\\1 \\2', '3.4(2.5-4.7)')
#[1] "3.4 (2.5-4.7)"

The way the regex works is that it creates two groups. The first group (.*) it takes all elements and the second group (\\(.*\\)) takes all elements after the parenthesis. Note that we need to escape the parenthesis so we use \\(. We then join those two groups with a space between them \\1 \\2

Sotos
  • 51,121
  • 6
  • 32
  • 66
  • Thanks Sotos. That worked perfectly. May I ask for the meaning of . * \\ and why they are enclosed in brackets. Any suggested website is also welcome. regex has been troubling me for a long time but I have not yet known how best to get started. – Patrick Jul 31 '20 at 07:42
  • 1
    I added some explanation. – Sotos Jul 31 '20 at 07:52
  • What about converting '3.4(2.5to4.7)' to '3.4 (2.5 to 4.7)' ? In addition to adding space before parenthesis, I also need space before and after "to". An what if converting '3.4(2.5,4.7)' to '3.4 (2.5, 4.7)' ? Sorry for these separate question in old thread. Thx – Patrick Jul 31 '20 at 08:01
2

A very short way uses sub, which will substitute the first open bracket ( with a space followed by an open bracket, i.e. (.

x <- '3.4(2.5-4.7)'
sub("\\(", " (", x)
# [1] "3.4 (2.5-4.7)"

Alternatively, you can specify the argument fixed = TRUE which considers the pattern as fixed and not as a regular expression.

x <- '3.4(2.5-4.7)'
sub("(", " (", x, fixed = TRUE)
# [1] "3.4 (2.5-4.7)"
Ric S
  • 9,073
  • 3
  • 25
  • 51