4

I have an array of regular expressions strings. One of them must match any strings found in a given java file.

This is the regex string I have so far: "(\").*[^\"].*(\")"

However, the string "Hello\"good day" is rejected even though the quotation mark inside the string is escaped. I think what I have immediately rejects the string literal when it finds a quotation mark inside regardless of whether it is escaped or not. I need it to accept string literals with escaped quotes but it should reject "Hello"Good day".

  Pattern regex = Pattern.compile("(\").*[^\"].*(\")", Pattern.DOTALL);
  Matcher matcher = regex.matcher("Hello\"good day");
  matcher.find(0); //false
pythonbeginner4556
  • 313
  • 1
  • 5
  • 14
  • 1
    Please post a [MCVE]. – Sotirios Delimanolis May 04 '16 at 15:54
  • You probably want to put a negative look-behind on the `"` character. But you will have a hard time dealing with comments. – aioobe May 04 '16 at 15:58
  • Also you say '"Hello\"good day" is rejected' and then you say 'but it should reject "Hello"Good day"'. That means it's working. – PeterS May 04 '16 at 15:59
  • *I need it to accept string literals with escaped quotes but it should reject `"Hello"Good day"`* - you must mean a regex like `String pat = "\"[^\\\\\"]*(?:\\\\.[^\"\\\\]*)*\""` and use it with `String#matches()`. EDIT: See what anubhava has just posted. – Wiktor Stribiżew May 04 '16 at 15:59
  • 1
    Do you need to worry about other escape sequences? `\n`, `\t`, `\u1234`? – Jeffrey May 04 '16 at 16:01
  • I do not need to worry about other escape sequences or comments. Just the quote escape – pythonbeginner4556 May 04 '16 at 16:05
  • Can you please try `String pat = "^(?:[^\"\\\\]|\\\\.|\"[^\\\\\"]*(?:\\\\.[^\"\\\\]*)*\")*$"`? – Wiktor Stribiżew May 04 '16 at 16:27
  • Pattern regex = Pattern.compile("^(?:[^\"\\\\]|\\\\.|\"[^\\\\\"]*(?:\\\\. [^\"\\\\]*)*\")*$", Pattern.DOTALL); Matcher matcher = regex.matcher("Hello\"good day"); boolean result=matcher.find(0); //I tried this, and result is false – pythonbeginner4556 May 04 '16 at 16:30
  • @pythonbeginner4556: Shouldn't it be false? It only has 1 double quote that is not escaped. Check [this demo](http://ideone.com/YM8If3). What if you reverse the logic? Will it work as expected then? – Wiktor Stribiżew May 04 '16 at 16:50

1 Answers1

20

In Java you can use this regex to match all escaped quotes between " and ":

boolean valid = input.matches("\"[^\"\\\\]*(\\\\.[^\"\\\\]*)*\"");

Regex being used is:

^"[^"\\]*(\\.[^"\\]*)*"$

Breakup:

^             # line start
"             # match literal "
[^"\\]*       # match 0 or more of any char that is not " and \
(             # start a group
   \\         # match a backslash \
   .          # match any character after \
   [^"\\]*    # match 0 or more of any char that is not " and \
)*            # group end, and * makes it possible to match 0 or more occurrances
"             # match literal "
$             # line end

RegEx Demo

anubhava
  • 761,203
  • 64
  • 569
  • 643
  • Pattern regex = Pattern.compile("\"([^\"\\\\]*)(\\\\.[^\"\\\\]*)*\"", Pattern.DOTALL); Matcher matcher = regex.matcher("Hello\"good day"); boolean result=matcher.find(0); //I get false when I use your regex string this way. How can I make it work – pythonbeginner4556 May 04 '16 at 16:09
  • Slightly quicker if use cluster group instead of capture. –  May 04 '16 at 17:10