-2

How can I use multiple backreferences in a function to produce the replacement in stringr functions, for example, in stringr::str_replace()?

An example: suppose I want the replacement to be rounded to a whole number and concatenated into one string (this particular function is just an example, the important thing is that it accepts > 1 backreference)

I have tried some variations on the following without success

round_concat <- function(x, y) { paste(round(as.numeric(x),0), round(as.numeric(y, 0)))}

library(stringr)
"ABC 23.3 text 105.43 more text" %>% str_replace_all(., "(\\d+)(\\.)(\\d+)", round_concat("\\1", "\\2"))

Note: I have looked for similar functionality in functions like base::gsub (see here) but without luck

stevec
  • 41,291
  • 27
  • 223
  • 311

2 Answers2

4

If you want to apply a function to the replacement backreference you could do:

prices %>% str_replace_all(., "(\\d+\\.\\d+)", function(x){round(as.numeric(x))})

Example:

prices = c("tomato: 12.23","potato: 9.53")
prices %>% str_replace_all(., "(\\d+\\.\\d+)", function(x){round(as.numeric(x))})

[1] "tomato: 12" "potato: 10"

In this case, the replacement is first converted to a number and the rounded to the nearest integer.

Or even:

str_replace_all(prices, "(\\d+\\.\\d+)", function(x){ nchar(x) })
[1] "tomato: 5" "potato: 4"
R. Schifini
  • 9,085
  • 2
  • 26
  • 32
  • 1
    This is nice! I never realized you could use a function in `replacement` – Rich Scriven Feb 08 '19 at 05:02
  • I didn't know of this functionality either. Very cool. However, I would like to be able to refer to different parts of the regex using `\\1`, `\\2`, `\\3` etc. Like what is described from 4m to 4m10s [here](https://www.youtube.com/watch?v=FCFdgymqpUI#t=4m0s). Is this possible? – stevec Feb 08 '19 at 05:13
  • @RichScriven Me neither! Just tried it and was surprised as well. :) – R. Schifini Feb 08 '19 at 05:14
  • 1
    I dont know if you can apply different functions to different captured groups, but if there is one function it is applied to all captured goups. – R. Schifini Feb 08 '19 at 05:17
1

This is what I ended up using (although I'd still love to know if multiple backrefs can be used in stringr::str_replace()).

Importantly, the solution below allows multiple backreferences to be provided to the replacement function

library(gsubfn)
"This string 24.45,32 contains numbers 67.0.5,150 lots of them" %>% 
  gsubfn("(\\d+)\\.(\\d+),(\\d+)",  ~ { paste(as.numeric(x) * 2,  as.numeric(y) * 0.5,  as.numeric(z) + 7 ) }, . , backref = -3)

# [1] "This string 48 22.5 39 contains numbers"

There are a few of things to note here:

  • x, y and z are provided to replacement, you can call them whatever you want
  • x, y and z simply represent each of the regex capturing groups
  • backref = -3 tells gsubfn() to expect 3 backreferences but not the match itself (see here)
  • Changing the -3 to 3 would mean gsubfn() would expect you to do something with the match as well, otherwise it throws an unused argument error
  • The above example uses 3 arguments, but you can use as many as you want
  • You can name the arguments whatever you want; they'll be available to the function in whatever order they appear in capturing groupings (i.e. ()) in the regular expression
  • don't forget the ~
stevec
  • 41,291
  • 27
  • 223
  • 311
  • 1
    Note that the `backref=` argument is automatically set to the negative of the number of capture groups if it is not specified so it could be omitted in this case. Also the brace brackets in the formula could be omitted. – G. Grothendieck Feb 20 '19 at 10:24