0

What I mean is that I need a regular expression that can match either something like this...

"I am a sentence."

or something like this...

"I am a sentence.

(notice the missing quotation mark at the end of the second one). My attempt at this so far is

["](\\.|[^"])*["]*

but that isn't working. Thanks for the help!

Edit for clarity: I am intending for this to be something like a C style string. I want functionality that will match with a string even if the string is not closed properly.

user3047641
  • 149
  • 1
  • 2
  • 12

1 Answers1

1

You could write the pattern as:

["](\\.|[^"\n])*["]?

which only has two small changes:

  • It excludes newline characters inside the string, so that the invalid string will only match to the end of the line. (. does not match newline, but a negated character class does, unless of course the newline is explicitly negated.)

  • It makes the closing doubke quote optional rather than arbitrarily repeated.

However, it is hard to imagine a use case in which you just want to silently ignore the error. So I wiuld recommend writing two rules:

["](\\.|[^"\n])*["]   { /* valid string */ }
["](\\.|[^"\n])*      { /* invalid string */ }

Note that the first pattern is guaranteed to match a valid string because it will match one more character than the other pattern and (f)lex always goes with the longer match.

Also, writing two overlapping rules like that does not cause any execution overhead, because of the way (f)lex compiles the patterns. In effect, the common prefix is automatically factored out.

rici
  • 234,347
  • 28
  • 237
  • 341