0

so I am creating an WML like language for my assignment and as a first step, I am supposed to create regular expressions to recognize the following:

//single = "{"
//double = "{{"
//triple = "{{{"

here is my code for the second one:

val double = "\\{\\{\\b".r

and my Test is:

println(double.findAllIn("{{ s{{ { {{{ {{ {{x").toArray.mkString(" "))

Bit it doesn't print anything ! It's supposed to print the first, second, fifth and 6th token. I have tried every single combination of \b and \B and even \{{2,2} instead of \{\{ but it's still not working. Any help??

As a side question, If I wanted it to match just the first and fifth tokens, what would I need to do?

dk14
  • 22,206
  • 4
  • 51
  • 88
Donat
  • 81
  • 3
  • 12

1 Answers1

1

I tested your code (Scala 2.12.2 REPL), and in contrary to your "it doesn't print anything" statement, it actually prints "{{" occurrence from "{{x" substring.

This is because x is a word character and \b matches a position between second { and x. Keep in mind that { isn't a word character, unlike x.

As per this tutorial

It matches at a position that is called a "word boundary". This match is zero-length

There are three different positions that qualify as word boundaries:

1) Before the first character in the string, if the first character is a word character

...

As for solution, it depends on precise definition, but lookarounds seemed to work for me:

"(?<!\\{)\\{{2}(?!\\{)".r

It matched "first, second, fifth and 6th token". The expression says match "{{" not preceded and not followed by "{".

For side-question:

"(?<![^ ])\\{\\{(?![^ ])".r //match `{` surrounded by spaces or line boundaries

Or, depending on your interpretation of "space":

"(?<!\\S)\\{\\{(?!\\S)".r

matched 1st and 5th tokens. I couldn't use positive lookarounds coz I wanted to take line beginnings and endings (boundaries) into account automatically. So double negation by ! and [^ ] created an effect of implicit inclusion of ^ and $. Alternatively, you could use:

"(?<=^|\\s)\\{\\{(?=\\s|$)".r

You can read about lookarounds here. Basically they match the symbol or expression as boundary; simply saying they match stuff but don't include it in the matched string itself.

Some examples of lookarounds

  • (?<=z)aaa matches "aaa" that is preceded by z
  • (?<!z)aaa matches "aaa" that is not preceded by z
  • aaa(?=z) matches "aaa" followed by z
  • aaa(?!z) matches "aaa" not followed by z

P.S. Just to make your life easier, Scala has """ for escaping, so let's say instead of:

"(?<!\\S)\\{\\{(?!\\S)".r

you can just:

"""(?<!\S)\{\{(?!\S)""".r
dk14
  • 22,206
  • 4
  • 51
  • 88
  • Thank you ! Haven't tested it yet but care to explain what each chunk of symbols in the parenthesis actually do? – Donat Feb 25 '18 at 17:45
  • @user7552492 Those are lookarounds, I added explanation to the answer. Basically they kinda like matching groups (like `"ab(za)k".r` matches "abzak", but you can extract "za" as `match.group(1)`), but without need to extract values programmatically. – dk14 Feb 25 '18 at 17:57