2

I'm writing a parser for a command line interface of an external tool and I'm using Scala's parser combinators library. As part of this I need to parse a standard date of the format EEE MMM d HH:mm:ss yyyy Z.

Scala's parser-combinators are "stream-based" and works with CharSequence's instead of Strings. That makes it hard for me to use either java.text.DateTimeFormat or DateTimeFormat from JodaTime since they both work with Strings.

As of now, I hade to write my own regex-parser like this to parse the date, but I would much rather incorporate the work that has been done with JodaTime into my parser. I really don't want to reinvent the wheel. I've been looking at the source-code of JodaTime and I'm not really sure why it needs to work with Strings instead of just CharSequences. Am I missing some aspect?

Viktor Hedefalk
  • 3,572
  • 3
  • 33
  • 48

3 Answers3

1

Got it, now. Ok, there's a simpler solution than forking. Here:

trait DateParsers extends RegexParsers {
  def dateTime(pattern: String): Parser[DateTime] = new Parser[DateTime] {
    val dateFormat = DateTimeFormat.forPattern(pattern);

    def jodaParse(text: CharSequence, offset: Int) = {
      val mutableDateTime = new MutableDateTime
      val maxInput = text.source.subSequence(offset, dateFormat.estimateParsedLength + offset).toString
      val newPos = dateFormat.parseInto(mutableDateTime, maxInput, 0)
      (mutableDateTime.toDateTime, newPos + offset)
    }

    def apply(in: Input) = {
      val source = in.source
      val offset = in.offset
      val start = handleWhiteSpace(source, offset)
      val (dateTime, endPos) = jodaParse(source, start)
      if (endPos >= 0)
        Success(dateTime, in.drop(endPos - offset))
      else
        Failure("Failed to parse date", in.drop(start - offset))
    }
  }
}
Daniel C. Sobral
  • 295,120
  • 86
  • 501
  • 681
  • Sorry, sure! But the input is still `Reader[Elem]` with `source`-method returning `CharSequence`. I don't see how I could use this to be able to pass a String into an external parsing-framework. For me this seems like an impossible thing, since the combinators work on streams. You basically don't know how many characters the date-string will have and therefore need a date-parser that works on streams too, and gives back info of how much of the stream it has consumed. – Viktor Hedefalk Feb 03 '11 at 14:37
  • I mean the input of the final combined parser. The produced output of the parser I'm trying to write should be a date. – Viktor Hedefalk Feb 03 '11 at 14:51
  • @hedefalk It is `scala.util.parsing.input.Reader`, which you can subclass to whatever need you have. For instance, you could use a lexer to return tokens, and then process using tokens. Please, post some code of what you are trying to do, because it isn't clear. – Daniel C. Sobral Feb 03 '11 at 15:18
  • I'm sorry about the lacking clarity of my original question, but I'm trying to make up for it. My own answer to my question is showing the `Parser` that I want. It's a `Parser[DateTime]`, but it uses an external parser for parsing dates by patterns ("yyyyMMdd"). The only two of these I've used are joda-time and the standard Java stuff. Both of them need `Strings` as input and I just don't have any `Strings` to give them. If `String` was an interface that I could implement, then I could wrap my `scala.util.parsing.input.Reader` to be a String, but that is not the case. – Viktor Hedefalk Feb 03 '11 at 15:37
  • cont… The call: `val newPos = dateFormat.parseInto(mutableDateTime, text, offset)` would not have been possible with joda-time before my change since `text` was typed to `String`. – Viktor Hedefalk Feb 03 '11 at 15:57
  • I really appreciate your help! I was thinking of doing something like that - finding an upper bound on the size of the date-string, but it felt like a hack. If there is an official method estimateParsedLength, it feels less of a hack, so thanks! – Viktor Hedefalk Feb 03 '11 at 20:35
0

I'm not sure what you are asking. Are you asking why RegexParser.parse()'s in parameter takes a CharSequence? If so there's another overloaded RegexParser.parse() that takes a Reader, which you can write a simple conversion function like so:

def stringToReader(str: String): Reader = new StringReader(str)

As to the date format, I find it perfectly fine to define it as a token in the parser.

Hope this helps.

Y.H Wong
  • 7,151
  • 3
  • 33
  • 35
  • Maybe my own answer clarified my question a bit…? What I'm trying to achieve is a parser that can be used in the production rules of other parsers and that produces a date. My own answer shows this, but I had to make changes to joda-time for it to accept `CharSequence`s. – Viktor Hedefalk Feb 03 '11 at 14:57
  • My question is not really regarding Scala's parser combinators, but if there is any established way to parse dates from patterns like "EE MMM d HH:mm:ss yyyy Z" that works on streams of characters so I can use it from within a combinator-parser. – Viktor Hedefalk Feb 03 '11 at 15:02
0

This is my solution right now:

I forked joda-time and made small changes for it to work on CharSequences instead of Strings. It's over here https://github.com/hedefalk/joda-time/commit/ef3bdafd89b334fb052ce0dd192613683b3486a4

Then I could write a DateParser like this:

trait DateParsers extends RegexParsers {
  def dateTime(pattern: String): Parser[DateTime] = new Parser[DateTime] {
    val dateFormat = DateTimeFormat.forPattern(pattern);

    def jodaParse(text: CharSequence, offset: Int) = {
      val mutableDateTime = new MutableDateTime
      val newPos = dateFormat.parseInto(mutableDateTime, text, offset)
      (mutableDateTime.toDateTime, newPos)
    }

    def apply(in: Input) = {
      val source = in.source
      val offset = in.offset
      val start = handleWhiteSpace(source, offset)
      val (dateTime, endPos) = jodaParse(source, start)
      if (endPos >= 0)
        Success(dateTime, in.drop(endPos - offset))
      else
        Failure("Failed to parse date", in.drop(start - offset))
    }
  }
}

Then I can use this trait to have production rules like:

private[this] def dateRow = "date:" ~> dateTime("EEE MMM d HH:mm:ss yyyy Z")

Am I overworking this? I'm really tired right now…

Viktor Hedefalk
  • 3,572
  • 3
  • 33
  • 48
  • Ah I see. So essentially you want the datetime regex lexer to advance as soon as the next byte from the network comes in. It does seems like there's no other way to do it except either fork joda-time and rewrite its match to use regex or better yet, write your own stream-based/regex datetime parsing library and share with us. I still believe your answer here is the way to go. – Y.H Wong Feb 04 '11 at 13:41
  • Yeah, I haven't found any other stream-based datetime parsing library. Plus joda-time is the oneandonly date api to use anyways, right? I emailed Stephen Colebourne about it and he agreed that it should parse CharSequence, but that there might be issues with backwards compatability so he couldn't promise anything. Worst case, I'll have to maintain my fork :) – Viktor Hedefalk Feb 04 '11 at 23:42
  • Or not. I wouldn't say joda-time is the end-all solutions for all your datetime needs. I've bumped into situations where joda-time's pattern just couldn't express the format I need and I had to fall back to SimpleDateFormat (ISO 8601 and RFC 3339 have LOTS of variations). – Y.H Wong Feb 06 '11 at 07:42
  • Writing your own regex -> datetime format really library wouldn't be that difficult, for each pattern letter there is a direct 1-to-1 mapping to a regex pattern. You can write that as a new class under org.joda.time.format and submit a patch. – Y.H Wong Feb 06 '11 at 07:45
  • As an update: I remember I made a PR for 310 that got incorporated and I think it stayed as CharSequence in JDK8. World is a better place. – Viktor Hedefalk Sep 02 '15 at 22:41