5

I'm new to R and am stuck with backreferencing that doesn't seem to work. In:

gsub("\\((\\d+)\\)", f("\\1"), string)

It correctly grabs the number in between parentheses but doesn't apply the (correctly defined, working otherwise) function f to replace the number --> it's actually the string "\1" that passes through to f.

Am I missing something or is it just that R does not handle this? If so, any idea how I could do something similar, i.e. applying a function "on the fly" to the (actually many) numbers that occur in between parentheses in the text I'm parsing?

Thanks a lot for your help.

JMD
  • 63
  • 3
  • Extract the numbers to a vector, apply the function on that vector, feed the result to `gsub`. – Roland Aug 26 '14 at 13:17
  • Thanks! Yes, extracting to a vector with gregexpr/regmatches is easy and I had been thinking about this -- but how do I feed this back to gsub? – JMD Aug 26 '14 at 13:21
  • 1
    @JMD welcome to stackoverflow. When you post it is helpful to post minimal data set as well. This link provides information on formatting questions: http://stackoverflow.com/help/how-to-ask – Tyler Rinker Aug 26 '14 at 13:40

4 Answers4

7

R does not have the option of applying a function directly to a match via gsub. You'll actually have to extract the match, transform the value, then replace the value. This is relativaly easy with the regmatches function. For example

x<-"(990283)M (31)O (29)M (6360)M"

f<-function(x) {
    v<-as.numeric(substr(x,2,nchar(x)-1))
    paste0(v+5,".1")
}

m <- gregexpr("\\(\\d+\\)", x)
regmatches(x, m) <- lapply(regmatches(x, m), f)
x
# [1] "990288.1M 36.1O 34.1M 6365.1M"

Of course you can make f do whatever you like just make sure it's vector-friendly. Of course, you could wrap this in your own function

gsubf <- function(pattern, x, f) {
    m <- gregexpr(pattern, x)
    regmatches(x, m) <- lapply(regmatches(x, m), f)
    x   
}
gsubf("\\(\\d+\\)", x, f)

Note that in these examples we're not using a capture group, we're just grabbing the entire match. There are ways to extract the capture groups but they are a bit messier. If you wanted to provide an example where such an extraction is required, I might be able to come up with something fancier.

MrFlick
  • 195,160
  • 17
  • 277
  • 295
  • Works fine! Hadn't thought about `regmatches <-`. Thanks a lot. And sorry for the delay, but it took me some time to figure out that `ifelse()` was vector-friendly while `if() else` wasn't... R is cool but sometimes a bit too idiosyncratic indeed! – JMD Aug 29 '14 at 11:40
  • One way to use the capture group is use `pattern` in `f` to extract it for further use `v <- sub(pattern, "\\1", x)`. – jnas May 12 '16 at 10:18
  • Similar use case using capture groups: https://stackoverflow.com/a/49344399/2371031 – Brian D May 21 '20 at 22:09
2

To use a callback within a regex-capable replacement function, you may use either gsubfn or stringr functions.

When choosing between them, note that stringr is based on ICU regex engine and with gsubfn, you may use either the default TCL (if the R installation has tcltk capability, else it is the default TRE) or PCRE (if you pass the perl=TRUE argument).

Also, note that gsubfn allows access to all capturing groups in the match object, while str_replace_all will only allow to manipulate the whole match only. Thus, for str_replace_all, the regex should look like (?<=\()\d+(?=\)), where 1+ digits are matched only when they are enclosed with ( and ) excluding them from the match.

With stringr, you may use str_replace_all:

library(stringr)  
string <- "(990283)M (31)O (29)M (6360)M"
## Callback function to increment found number:
f <- function(x) { as.integer(x) + 1 }
str_replace_all(string, "(?<=\\()\\d+(?=\\))", function(m) f(m))
## => [1] "(990284)M (32)O (30)M (6361)M"

With gsubfn, pass perl=TRUE and backref=0 to be able to use lookarounds and just modify the whole match:

gsubfn("(?<=\\()\\d+(?=\\))", ~ f(m), string, perl=TRUE, backref=0)
## => [1] "(990284)M (32)O (30)M (6361)M"

If you have multiple groups in the pattern, remoe backref=0 and enumerate the group value arguments in the callback function declaration:

gsubfn("(\\()(\\d+)(\\))", function(m,n,o) paste0(m,f(n),o), string, perl=TRUE)
        ^ 1 ^^  2 ^^ 3 ^           ^^^^^^^          ^^^^   
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • `str_replace_all` does not appear to work any differently than `gsub` when using a function that references backreferences/capture groups. – Brian D May 21 '20 at 21:18
  • @BrianD If you can't use a callable with `str_replace_all`, you are using some very old R/stringr version. See [**online R demo**](https://tio.run/##RY3hCoIwFIX/@xQXf92LFupAEgyfQHqBgSybMpgrrgZF9Oxro6A/h@/A4TvsvTVnVvzEdWPjZiaA5IvQ7iDFpimqg6AeUJR0AqyayLWoC@rTZIqj6e7GzVwdPgheoNa9cZueNceeQQnvKBxY36wa9aCs/X3lQd@1RymRpLxk2AUkSvO/cCGYQpD3Hw) proving `stringr::str_replace_all` works as explained in the answer. There is no way to access capturing groups in the callback, but I do not claim it does in the answer. **OP uses a reference to the whole mach** and `str_replace_all` does the job well. – Wiktor Stribiżew May 21 '20 at 21:27
0

This is for multiple different replacements.

text="foo(200) (300)bar (400)foo (500)bar (600)foo (700)bar"

f=function(x)
{
  return(as.numeric(x[[1]])+5)
}
a=strsplit(text,"\\(\\K\\d+",perl=T)[[1]]

b=f(str_extract_all(text,perl("\\(\\K\\d+")))

paste0(paste0(a[-length(a)],b,collapse=""),a[length(a)])  #final output
#[1] "foo(205) (305)bar (405)foo (505)bar (605)foo (705)bar"
sidpat
  • 735
  • 10
  • 26
  • Thanks but no, what I'm trying to do is to replace numbers in the string directly through the function. And str_replace_all from stringr doesn't work either, probably because it's based on gsub etc. – JMD Aug 26 '14 at 13:28
  • Thanks again, but doesn't seem to work when there are multiple occurrences of \\d in the string, such as in: (990283)M (31)O (29)M (6360)M – JMD Aug 26 '14 at 13:44
0

Here's a way by tweaking a bit stringr::str_replace(), in the replace argument, just use a lambda formula as the replace argument, and reference the captured group not by ""\\1" but by ..1, so your gsub("\\((\\d+)\\)", f("\\1"), string) will become str_replace2(string, "\\((\\d+)\\)", ~f(..1)), or just str_replace2(string, "\\((\\d+)\\)", f) in this simple case :

str_replace2 <- function(string, pattern, replacement, type.convert = TRUE){
  if(inherits(replacement, "formula"))
    replacement <- rlang::as_function(replacement)
  if(is.function(replacement)){
    grps_mat <- stringr::str_match(string, pattern)[,-1, drop = FALSE]
    grps_list <- lapply(seq_len(ncol(grps_mat)), function(i) grps_mat[,i])
    if(type.convert) {
      grps_list <- type.convert(grps_list, as.is = TRUE) 
      replacement <- rlang::exec(replacement, !!! grps_list)
      replacement <- as.character(replacement)
    } else {
      replacement <- rlang::exec(replacement, !!! grps_list)
    }
  }
  stringr::str_replace(string, pattern, replacement)
}

str_replace2(
  "foo (4)",
  "\\((\\d+)\\)", 
  sqrt)
#> [1] "foo 2"

str_replace2(
  "foo (4) (5)",
  "\\((\\d+)\\) \\((\\d+)\\)", 
  ~ sprintf("(%s)", ..1 * ..2))
#> [1] "foo (20)"

Created on 2020-01-24 by the reprex package (v0.3.0)

moodymudskipper
  • 46,417
  • 11
  • 121
  • 167