1

As of version 4.0 R supports a special syntax for raw strings, but, how can it be used in tandem with string interpolation? That could be very useful for passing raw regular expressions. E.g., 123\b instead of 123\\b. I've tried using glue:

> tmp = "123\b"
> str_detect("123 4", glue(r"[{tmp}]"))
[1] FALSE

Using a raw string directly does work:

> str_detect("123 4", r"[123\b]")
[1] TRUE
dimid
  • 7,285
  • 1
  • 46
  • 85

1 Answers1

1

The problem here is that after tmp is defined, it is too late to have the \b interpreted as a literal sequence of characters. The character string is stored internally as the byte sequence 31 32 33 08, not the byte sequence 31 32 33 5c 62, which is what you would need for your example to work.

If you have existing character strings you wish to use in this way, you need to convert the escape sequences back into literal backslash-character pairs before you use them. One fairly hacky way to do this is to use the console's printing method itself.

As you showed yourself, this doesn't work:

tmp  <- "123\b"

charToRaw(tmp)
#> [1] 31 32 33 08

stringr::str_detect("123 4", tmp)
#> [1] FALSE

But if we write a little wrapper around capture.output, we can get the characters that R needs to replicate the original intended string:

f <- function(x) substr(capture.output(noquote(x)), 5, 1e4)

charToRaw(f(tmp))
#> [1] 31 32 33 5c 62

stringr::str_detect("123 4", f(tmp))
#> [1] TRUE

So the function f can be thought of as a way of properly catching the string literals. The new raw string input method can't really help here.

Created on 2021-10-24 by the reprex package (v2.0.0)

Allan Cameron
  • 147,086
  • 7
  • 49
  • 87
  • Thanks, could you explain what are the constants `5` and `1e4`? – dimid Oct 24 '21 at 17:30
  • 1
    @dimid Capturing the console output will include the ‘[1] ‘ at the start of the printed line. Using substr starting at 5 will remove this, but you also need to set an end value to substr - 10e4 just allows a very long string. – Allan Cameron Oct 24 '21 at 17:45
  • Thanks, that's a definitely a valid approach. I was hoping for something more elegant à la python's `rf"{tmp}"`, but I guess it's not possible yet. – dimid Oct 25 '21 at 07:43