0

We have a library where users can pass in dates in multiple formats. They follow the ISO but are abbreviated at times.

So we get things like "19-3-12" and "2019-03-12T13:12:45.1234" where the fractional seconds can be 1 - 7 digits long. It's a very large number of combinations.

DateTimeFormatter.parseBest doesn't work because it won't accept "yy-m-d" for a local date. The solutions here won't work because it assumes we know the pattern - we don't.

And telling people to get their string formats "correct" won't work as there's a ton of existing data (these are mostly in XML & JSON files).

My question is, how can I parse strings coming in in these various pattersn without have to try 15 different explicit patterns?

Or even better, is there some way to parse a string and it will try everything possible and return a Temporal object if the string makes sense for any date[time]?

David Thielen
  • 28,723
  • 34
  • 119
  • 193
  • OK. But... what is your question? – JB Nizet Mar 22 '19 at 13:43
  • @JBNizet - Sorry, explicit question added to the end now. – David Thielen Mar 22 '19 at 13:45
  • You're excluding the simplest solution. Just do that. – JB Nizet Mar 22 '19 at 13:46
  • @JBNizet I think calling 15 times is going to be slow. There are a lot of code paths where we try to parse to see if a string is a datetime. Usually it isn't. So we then have 15 attempts and that takes time. – David Thielen Mar 22 '19 at 13:48
  • 1
    @DavidThielen if the date order is always `y-m-d` you might as well parse it manually. Split on `T`, then first token on `-` and second token on `:` if you have one – Bentaye Mar 22 '19 at 13:50
  • 3
    I doubt that has a significant impact on the performance. You shouldn't exclude the obvious, simple solution just because you think it's going to be slow. Only optimize after you're proven that it caused a performance problem. – JB Nizet Mar 22 '19 at 13:52
  • @DavidThielen If you ask if there is a magic way to guess the format : no, there isn't. Either you format individually each date assuming that all dates coming from a given source are formatted uniformly, or you try 15 patterns in a sequence (from most secure to most ambiguous pattern) until one matches. – Arnaud Denoyelle Mar 22 '19 at 13:59
  • 1
    You can speed it up by adding a bit of complexity, but before doing that, measure and prove that it causes a performance problem. One way of speeding things up would be to only try the patterns which have a length equal/compatible with the length of the string. – JB Nizet Mar 22 '19 at 14:07

2 Answers2

0

Trying all the possible formats would perform worse than trying only 15.

You can try to "normalize" to a single format but then you would be doing the work those 15 formats are supposed to do.

I think the best approach is the one described by @JB Nizet, to try only patterns that match string length.

public Date parse(String openFormat) { 
    String[] formats = {"YYY-MM-DD"};
    switch(openFormat.length()) {
       case 24: // 2019-03-12T13:12:45.1234
             formats = new String[]{"YYY-MM-DDThh:mm:ssetcetc", }; // all the formats for length 24
             break;
       ...
       case 6: //YYY-MM-DD, DD-MM-YYYY
             formats = new String[]{YYY-MM-DD", "DD-MM-YYYY", }; // all the formats for length 6
             break;
      }
      Date myDate
      // now try the reduced number of formats, possibly only 1 or 2
      for( String format : formats) try {
          myDate = date parse ( format ) etcetc
      } catch (DateFormatException d) {
          continue;
      } 
      if (myDate == null){
         throw InvalidDate
      } else {
      return myDate
      }
 }
OscarRyz
  • 196,001
  • 113
  • 385
  • 569
0

Without a full specification it is hard to give a precise recommendation. The techniques generally used for variable formats include:

  1. Trying a number of known formats in turn.
  2. Optional parts in the format pattern.
  3. DateTimeFormatterBuilder.parseDefaulting() for parts that may be absent from the parsed string.
  4. As you are aware, parseBest.

I am assuming that y-M-d always come in this order (never M-d-y or d-M-y, for example). 19-3-12 conflicts with ISO 8601 since the standard requires (at least) 4 digit year and 2 digit month. A challenge with 2-digit year is guessing the century: is this 1919 or 2019 or might it be 2119?

The good news: presence and absence of seconds and varying number of fractional digits are all built-in and pose no problem.

From what you have told us it seems to me that the following is a fair shot.

    DateTimeFormatter formatter = new DateTimeFormatterBuilder()
            .appendPattern("[uuuu][uu]-M-d")
            .optionalStart()
            .appendLiteral('T')
            .append(DateTimeFormatter.ISO_LOCAL_TIME)
            .optionalEnd()
            .toFormatter();

    TemporalAccessor dt = formatter.parseBest("19-3-12", LocalDateTime::from, LocalDate::from);
    System.out.println(dt.getClass());
    System.out.println(dt);

Output:

class java.time.LocalDate
2019-03-12

I figure that it should work with the variations of formats that you describe. Let’s just try your other example:

    dt = formatter.parseBest( "2019-03-12T13:12:45.1234", LocalDateTime::from, LocalDate::from);
    System.out.println(dt.getClass());
    System.out.println(dt);
class java.time.LocalDateTime
2019-03-12T13:12:45.123400

To control the interpretation of 2-digit year you may use one of the overloaded variants of DateTimeFormatterBuilder.appendValueReduced(). I recommend that you consider a range check on top of it.

Ole V.V.
  • 81,772
  • 15
  • 137
  • 161