4

I'm trying to get the location of \ or / in a string. Below is the code I'm attempting:

x <- "<span id=\"ref_12590587_l\">6,803.61</span>_l>"
gregexpr("\\\", x)
which(strsplit(x, "")[[1]]=="\")

My problem is when I attempt these codes in Rstudio, I get a continue prompt, the REPL prompt becomes +. These codes work for other characters.

Why I'm getting the continue prompt, even though the \ is quoted in the inverted quotes?

Edit: corrected the string after comment.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
Frash
  • 724
  • 1
  • 10
  • 19
  • You do not have "\" in your input string. If you want to look for a `/`, just use `gregexpr("/", x)`, and if you want to look for "\", use `gregexpr("\\\\", x)`. – Wiktor Stribiżew Jun 08 '15 at 09:56
  • Edited the string to correct one. @stribizhev Would you mind explaining the 4 escapes `\\\\` part please? – Frash Jun 08 '15 at 10:12
  • 1
    `gregexpr` expects a regular expression. In regular expressions, "\" is a special symbol, thus it must be escaped for the regex engine. But in `gregexpr`, we pass a string that itself is using "\" for escaping entities like `\n`. So, we need to escape the backslash for R first, and then for the regex engine. – Wiktor Stribiżew Jun 08 '15 at 10:20

2 Answers2

1

You have to add another slash (as stribizhev says in the comments). So you're looking for

gregexpr("\\\\", x)

The reason why is that the you need to escape the \, twice. So \\ gives you only 1 backslash. When you put 3 in, the 3rd backslash is actually escaping the quote!

See for an example:

gregexpr("\"", 'hello, "hello"')

This is searching for the quote in the string.

nsheff
  • 3,063
  • 2
  • 24
  • 29
  • Thanks for the explanation. Is there any source for more such nuances like this? – Frash Jun 08 '15 at 10:15
  • see [intro to R character vectors](http://cran.r-project.org/doc/manuals/r-release/R-intro.html#Character-vectors) for a start... – nsheff Jun 08 '15 at 10:18
1

Just to formalize my comments:

  1. Your x variable does not contain any backslashes, these are escaping characters that allow us putting literal quotation marks into a string.
  2. gregexpr("\\\", x) contains a non-closed string literal because the quotation mark on the right is escaped, and thus is treated as a literal quotation mark, not the one that is used to "close" a string literal.
  3. To search for a literal \ in gregexpr, we need 4 backslashes \\\\, as gregexpr expects a regular expression. In regular expressions, "\" is a special symbol, thus it must be escaped for the regex engine. But inside gregexpr, we pass a string that itself is using \ for escaping entities like \n. So, we need to escape the backslash for R first, and then for the regex engine.

That said, you can use

gregexpr("\\\\", x) 

to get only literal backslashes, or

gregexpr("\\\\|/", x)

to also look for forward slashes.

See IDEONE demo

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563