1

I wrote the following Parser with the intent of fail-ing on whitespace:

import scala.util.parsing.combinator._

object Foo extends JavaTokenParsers { 
  val wsTest = not(whiteSpace) // uses whitespace inherited from `RegexParsers`
}

Why is parsing a bunch of whitespace successfull?

scala> Foo.parseAll(Foo.wsTest, "          ")
res5: Foo.ParseResult[Unit] = [1.11] parsed: ()

scala> res5.successful
res6: Boolean = true

Looking at Parsers#not from the project, I would've expected a Failure for my above test.

  /** Wrap a parser so that its failures and errors become success and
   *  vice versa -- it never consumes any input.
   */
  def not[T](p: => Parser[T]): Parser[Unit] = Parser { in =>
    p(in) match {
      case Success(_, _)  => Failure("Expected failure", in)
      case _              => Success((), in)
    }
  }
Kevin Meredith
  • 41,036
  • 63
  • 209
  • 384
  • The `not` works correctly. My guess is that the parser skips white spaces by default and you have to disable that. Maybe this helps: http://stackoverflow.com/questions/3564094/parsing-a-blank-whitespace-with-regexparsers – Kigyo Sep 01 '14 at 16:27
  • `My guess is that the parser skips white spaces by default ` - I've observed this behavior with a class extending `JavaTokenParsers`. However, I would not have expected `Foo.parseAll(Foo.wsTest, " ")` to have succeeded. – Kevin Meredith Sep 01 '14 at 17:32

1 Answers1

4

JavaTokenParsers extends RegexParsers, RegexParsers has:

 protected val whiteSpace = """\s+""".r

 def skipWhitespace = whiteSpace.toString.length > 0

 implicit def regex(r: Regex): Parser[String] = new Parser[String] {
    ... 
    val start = handleWhiteSpace(source, offset)
    ...
 }

 protected def handleWhiteSpace(source: java.lang.CharSequence, offset: Int): Int =
   if (skipWhitespace)
     (whiteSpace findPrefixMatchOf (source.subSequence(offset, source.length))) match {
       case Some(matched) => offset + matched.end
       case None => offset
     }
   else
     offset

so it skips whitespace (you can override this by overriding def skipWhitespace = false)

so for the parser " " equals ""

whitespace tries to match "" but it fails ("""\s+""" requires at least one whitespace) and the not converts this in a success

Siphor
  • 2,522
  • 2
  • 13
  • 10
  • Thanks, @Siphor. I actually asked this question as a follow-up to the implementation of this [answer](http://stackoverflow.com/a/25294257/409976). I added `val nonWhitespaceRegex: Regex = "\\S+".r`, along with `guard(nonWhitespaceRegex) ~> ...` in order to verify that the `Input`, i.e. tokens that I was parsing, was not all whitespace. Otherwise, checking a bunch of whitespace to be EOF will return false. – Kevin Meredith Sep 02 '14 at 00:42