2

I have written a regex to match sentences with quotation marks on both sides in a single line:
(?<!")"([^"\r]+)"(?!")

Input Text:
The sign said, "Walk." Then it said, "Don't Walk", then, "Walk", all within thirty seconds. He yelled, "Hurry up."

Match 1: "Walk."
Match 2: "Don't Walk"
Match 3: "Walk"
Match 4: "Hurry up."

Now, I want to have only matches which include a single space after opening quotation mark.

I tried to add (\ {1}) inside the regex after the first quotation. Now it looks like:
(?<!")"((\ {1})[^"\r]+)"(?!")

My new match is:
Match 1: " Then it said, "

But I expect no matches because there is no single space after quotation in any of my earlier 4 matches.

Now the whole thing is messed up because it ignores the initial structure and matches quotations independently which results in looking spaces even after closing quotation.

Any idea how to resolve this?

Thanks

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
amirvf
  • 23
  • 2
  • is this `"([^"\r\n]+)"` all that needer? –  Jun 10 '20 at 22:15
  • <--quote over there `" Then it said, "`quote over here --> need balance quote processing. It can of worms how bad is yuo need it –  Jun 10 '20 at 22:16
  • It seems that "...a single space" has been interpreted to mean "...at least one space" rather than "...exactly one space". Please clarify. – Cary Swoveland Jun 15 '20 at 17:21
  • In general, questions are clearest when they begin with a statement of the problem, followed, if helpful, with one or more illustrating examples, with the desired result shown for each. Only then present code in need of repair. Here you *might* say you wish to "extract the text between two successive double quotes when the opening quote is followed by exactly 1 space". I say "might" because I don't know exactly what you want. For example, your code contains one or two capture groups. I can't tell if those capture groups are desired or you have assumed they are needed, when matches may suffice. – Cary Swoveland Jun 15 '20 at 18:25
  • I asked you what you mean by "a single space". I am still waiting for your answer. – Cary Swoveland Jun 17 '20 at 05:06
  • @amir Please kindly upvote my answer if it helped you. – Ryszard Czech May 12 '21 at 21:32

3 Answers3

0

The problem is that the double quote is your start and close delim char.

Use PCRE regex:

(?<!")"(?!\ )([^"\r]+)"(?!")(*SKIP)(*F)|(?<!")"\ ([^"\r]+)"(?!")

See proof. (?<!")"(?!\ )([^"\r]+)"(?!")(*SKIP)(*F) will match double quoted strings that does not have a space after the initial ", and will skip these matches. (?<!")"\ ([^"\r]+)"(?!") will fetch you the expected matches.

Ryszard Czech
  • 18,032
  • 4
  • 24
  • 37
0

“Inside quotes” can be asserted by using a look ahead that requires the total number of quote chars that follow to be even:

" [^"]*"(?=(([^"]*"){2})*[^"]*$)

See live demo (I added a space in front of Don't walk to prove the regex does find quoted text starting with a space)

Note that you do not need to escape a space char, and a quantifier of {1} may be deleted without affecting the result.

Bohemian
  • 412,405
  • 93
  • 575
  • 722
  • If "...matches which include a single space after opening quotation mark" is interpreted to mean "exactly one space" there seems to be a problem. I've asked for clarification. – Cary Swoveland Jun 15 '20 at 17:17
0

If your objective is to obtain the text between successive double quotes when there is exactly one space after the opening quote, you could match the pattern:

(?<=") (?! )[^"\r\n]+(?=")

Start your engine!

If the space following the opening quote is not to be part of the string matched, change the regex to the following.

(?<=" )(?! )[^"\r\n]+(?=")
Cary Swoveland
  • 106,649
  • 6
  • 63
  • 100