1

What I wish to parse is any Regex that quoted with double quotes. For example, "([A-Z]+[A-Z]+[C])"

What I have tried so far is the following in Scala using fastparse library:

  def regex[_: P]: P[Unit] = P(AnyChar.rep).log
  def quotedRegex[_: P]: P[Unit] = P("\"" ~ regex ~ "\"").log

  val Parsed.Failure(label, index, extra) = parse(""""str"""", quotedRegex(_))

But this throws exception:

+quotedRegex:1:1, cut
  +regex:1:2, cut
  -regex:1:2:Success(1:6, cut)
-quotedRegex:1:1:Failure(quotedRegex:1:1 / "\"":1:6 ..."\"str\"", cut)
label = "\""
index = 5
trace = TracedFailure((any-character | "\""),(any-character | "\""),Parsed.Failure(Expected "\"":1:6, found ""))

What I understood so far is that regex parser is consuming the last double quote as well. But I am not able to figure out how to avoid that! I presume we need to write a lookahead of some sort and somehow avoid parsing the last character but not sure how to do this.

Please help.

iamsmkr
  • 800
  • 2
  • 10
  • 29

1 Answers1

1

To do negative lookaheads, use !. It will make sure the next character is not a double quote, but doesn't actually consume anything, just like a negative lookahead in normal regex. Then you can match on AnyChar or some other pattern.

def regex[_: P]: P[Unit] = P((!"\"" ~ AnyChar).rep).log

Here it is running in Scastie.

user
  • 7,435
  • 3
  • 14
  • 44
  • Thanks for your response @user. This solution seems to work mostly but fails to capture the scenario when double qoutes are part of the regex. For example, If regex is `/"((?:""|[^"])*)"/` then qouted regex would be `"/"((?:""|[^"])*)"/"`. Which is why in the question I asked to get rid of the last qoute explicitly. Awaiting your response! TIA – iamsmkr Nov 18 '20 at 07:44
  • @ShivamKapoor I don't understand. If the regex is surrounded by double quotes but also contains double quotes, how do you know if a double quote is part of the regex or the end of the string? For example, `"/"((?:""|[^"])*)"/"` could be interpreted as `"/"`, `"/"((?:"`, `"/"((?:""`, etc. Is there guaranteed to also be a forward slash after the first double quote and before the last double quote? – user Nov 18 '20 at 22:13
  • Let me show you the dsl that I intend to parse -> `REGEX({EvtApplication.Event Text},"(\d\d\d\d)")`. What lies within the quotes is the regex which could also include a double qoute. What i need is to extract this regex alone like so `(\d\d\d\d)`. There is no character escaping in the dsl. – iamsmkr Nov 19 '20 at 10:15
  • @ShivamKapoor So at the end of the regex, there is guaranteed to be a `")`? If so, you can easily [modify it](https://scastie.scala-lang.org/uB3FHY5iRP6KrAoPTisTig) to use `!"\")"`. – user Nov 19 '20 at 14:05
  • no. It could anything. That's just example for grouping in regexes. Nothing is fixed within the quotes just like any regex! – iamsmkr Nov 19 '20 at 14:13
  • 1
    @ShivamKapoor Then I'm afraid it will be impossible to parse. There's no way of telling if a quote is part of the regex or is at the end. You'll either have to escape the regex or mark the ends in a different way (`"""` or `/` maybe) – user Nov 19 '20 at 14:15