4

There are quite a few similar questions already but none of them works in my case. I have a string that contains multiple substrings inside double quotes and these substrings can contain escaped double quotes.

For example for the string 'And then, "this is some sample text with quotes and \"escaped quotes\" inside". Not that we need more, but... "here is \"another\" one". Just in case.', the expected result is an array with two elements;

  • "this is some sample text with quotes and \"escaped quotes\" inside"
  • "here is \"another\" one"

The /"(?:\\"|[^"])*"/g regex works as expected on regex101; however, when I use String#match() the result is different. Check out the snippet below:

let str = 'And then, "this is some sample text with quotes and \"escaped quotes\" inside". Not that we need more, but... "here is \"another\" one". Just in case.'
let regex = /"(?:\\"|[^"])*"/g

console.log(str.match(regex))

Instead of two matches, I got four, and the text inside the escaped quotes is not even included.

MDN mentions that if the g flag is used, all results matching the complete regular expression will be returned, but capturing groups will not. If I want to obtain capture groups and the global flag is set, I need to use RegExp.exec(). I've tried it, the result is the same:

let str = 'And then, "this is some sample text with quotes and \"escaped quotes\" inside". Not that we need more, but... "here is \"another\" one". Just in case.'
let regex = /"(?:\\"|[^"])*"/g
let temp
let matches = []

while (temp = regex.exec(str))
  matches.push(temp[0])

console.log(matches)

How could I get an array with those two matched elements?

Zsolt Meszaros
  • 21,961
  • 19
  • 54
  • 57

2 Answers2

3

Another option is a more optimal regex without | operator:

const str = String.raw`And then, "this is some sample text with quotes and \"escaped quotes\" inside". Not that we need more, but... "here is \"another\" one". Just in case.`
const regex = /"[^"\\]*(?:\\[\s\S][^"\\]*)*"/g
console.log(str.match(regex))

Using String.raw, there is no need escaping quotes twice.

See regex proof. Btw, 28 steps vs. 267 steps.

EXPLANATION

--------------------------------------------------------------------------------
  "                        '"'
--------------------------------------------------------------------------------
  [^"\\]*                  any character except: '"', '\\' (0 or more
                           times (matching the most amount possible))
--------------------------------------------------------------------------------
  (?:                      group, but do not capture (0 or more times
                           (matching the most amount possible)):
--------------------------------------------------------------------------------
    \\                       '\'
--------------------------------------------------------------------------------
    [\s\S]                   any character of: whitespace (\n, \r,
                             \t, \f, and " "), non-whitespace (all
                             but \n, \r, \t, \f, and " ")
--------------------------------------------------------------------------------
    [^"\\]*                  any character except: '"', '\\' (0 or
                             more times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
  )*                       end of grouping
--------------------------------------------------------------------------------
  "                        '"'
Ryszard Czech
  • 18,032
  • 4
  • 24
  • 37
2

The reason why regex doesn't work as expected is because a single backslash is an escape character. You'll need escape the backslashes in the text:

let str = 'And then, "this is some sample text with quotes and \"escaped quotes\" inside". Not that we need more, but... "here is \"another\" one". Just in case.';
let regex = /"(?:\\"|[^"])*"/g

console.log(str);
console.log(str.match(regex))

str = 'And then, "this is some sample text with quotes and \\"escaped quotes\\" inside". Not that we need more, but... "here is \\"another\\" one". Just in case.';

console.log(str);
console.log(str.match(regex))
vanowm
  • 9,466
  • 2
  • 21
  • 37