0

I have one library which supports some kind of custom language. The parser is written using scala RegexParsers. Now I'm trying to rewrite our parser using fastparse library to speedup our engine. The question is: Is it possible to parse properly params inside our pseudolanguage function?

Here is an example:

$out <= doSomething('/mypath[text() != '']', 'def f(a) {a * 2}', ',') <= $in

here is a function doSomething with 3 params:

  1. /mypath[text() != '']
  2. def f(a) {a * 2}
  3. ,

I'm expecting to get a tree for the function with params:

Function(
    name = doSomething
    params = List[String](
        "/mypath[text() != '']",
        "def f(a) {a * 2}",
        ","
    )
)

What I do:

val ws = P(CharsWhileIn(" \r\n"))
def wsSep(sep: String) = P(ws.? ~ sep ~ ws.?)
val name = P(CharsIn('a' to 'z', 'A' to 'Z'))
val param = P(ws.? ~ "'" ~ CharPred(_ != '\'').rep ~ "'" ~ ws.?)
val params = P("(" ~ param.!.rep(sep = wsSep(",")) ~ ")")
val function = P(name.! ~ params.?).map(case (name, params) => Function(name, params.getOrElse(List())))

The problem here that the single quotes represent a String in my code, but inside that string sometimes we have additional single quotes like here:

/mypath[text() != '']

So, I can't use CharPred(_ != '\'') in my case

We also have a commas inside a Strings like in 3rd param

This is works somehow using scala parser but I can't parse the same using fastparse

Does anyone have ideas how to make the parser work properly?

Update

Got it! The main magic is in val param

object X {

  import fastparse.all._

  case class Fn(name: String, params: Seq[String])

  val ws = P(CharsWhileIn(" \r\n"))
  def wsSep(sep: String) = P(ws.? ~ sep ~ ws.?)
  val name = P(CharIn('a' to 'z', 'A' to 'Z').rep)
  val param = P(ws.? ~ "'" ~ (!("'" ~ ws.? ~ ("," | ")")) ~ AnyChar).rep  ~ "'" ~ ws.?)
  val params = P("(" ~ param.!.rep(sep = wsSep(",")) ~ ")")
  val function = P(name.! ~ params.?).map{case (name, params) => Fn(name, params.getOrElse(Seq()))}
}


object Test extends App {
  val res = X.function.parse("myFunction('/hello[name != '']' , 'def f(a) {mytest}', ',')")
  res match {
    case Success(r, z) =>
      println(s"fn name: ${r.name}")
      println(s"params:\n {${r.params.mkString("\n")}\n}")
    case Failure(e, z, m) => println(m)
  }
}

out:

name: myFunction
params:
'/hello[name != '']' 
'def f(a) {mytest}'
','
dyrkin
  • 544
  • 4
  • 15
  • 3
    The language you are trying to read must have some sort of definition on how to handle this. If I understand correctly, in this language you can have unescaped quotes inside a string? So how then can you tell if a quote is ending a string or is just a character in that string? – puhlen May 23 '17 at 16:59
  • I see, but this works somehow in the old implementation. I'm looking for something like: open quote -> look for closing quote -> if closing quote is not followed by comma or closing parenthesis (this is means that it is a part of a string) then continue looking for closing quote else close quote – dyrkin May 23 '17 at 19:14
  • okay then that is your rule, you close a string on a quote followed by a comma or closing parenthesis. This means you need to be able to look at more than one character while tokenizing your input. I don't know fastparse so I can't tell you how it's done with that library. A java regex matching your string would probably look similar to `'.*?'[,)]` – puhlen May 23 '17 at 19:18

0 Answers0