4

I have input with dates in many differing formats. I'm basically trying to take all the different formats and standardize them into ISO 8601 format.

If the date contains a month name, e.g., March, then I'm using the following function to get the month number, e.g. 03.

month = String.valueOf(Month.valueOf(month.toUpperCase()).getValue());

Anyway, the problem I am having is that the month names are in many different languages, with no indication what language they will be. I'm getting the following error when running the above function:

Caused by: java.lang.IllegalArgumentException: No enum constant java.time.Month.AUGUSTI
    at java.lang.Enum.valueOf(Enum.java:238)
    at java.time.Month.valueOf(Month.java:106)

Is there any library that can deal with month names in many different languages, returning the numeric value, or even just translating the month name to English?

Here's a sample of the input date:

1370037600
1385852400
1356994800
2014-03-01T00:00:00
2013-06-01T00:00:00
2012-01-01
2012
May 2012
März 2010
Julio 2009
Franz Kafka
  • 780
  • 2
  • 13
  • 34
  • Why don't you just parse the date, using the support available in the standard Java library? Wht exactly is the input that you need to parse? – JB Nizet Sep 16 '17 at 22:46
  • @JBNizet I've updated the question adding the input data. As you can see, the formats are all very different. With regards to why I'm not using the support available in the standard Java library: I'm not sure how to do that with the given input. Also, not sure how Java.time.Month is not standard Java. – Franz Kafka Sep 16 '17 at 22:52
  • OK. How do you parse them? How do you know what the language is? How do you know what the format is of each input is? How do you know that Julio is a month and not a day? – JB Nizet Sep 16 '17 at 22:54
  • @JBNizet I compile some patterns (e.g., ^\d{4}$, ^(\d{4}-\d{2}-\d{2}), \\w+\\s+\\d{4}, etc). The one thing that I will note is that the input data does match at least one of the patterns, and everything works other than converting the date when the month name is not English. All other aspects have been figured out, other than the exception in the OP above. – Franz Kafka Sep 16 '17 at 23:00
  • Do you also have different numerical formats? Like dd.mm.yyyy and mm.dd.yyyy? I don't see any good way to deal with that other than going after the provider to *please* deliver better data... – sruetti Sep 16 '17 at 23:34
  • 1
    @sruetti, The input data in the above post covers every format that is in the files. I'd love to be dealing with better data, but unfortunately that's not going to happen. My job is to standardize all this data and create the better data :) – Franz Kafka Sep 16 '17 at 23:57

1 Answers1

5

If you have no idea in what language the month name is, the only way is to loop through all the available values of java.util.Locale, use a java.time.format.DateTimeFormatter and try to parse the month until you find one that works:

String input = "März 2010";
// formatter with month name and year
DateTimeFormatter fmt = DateTimeFormatter.ofPattern("MMMM yyyy");
Month month = null;
for (Locale loc : Locale.getAvailableLocales()) {
    try {
        // set the locale in the formatter and try to get the month
        month = Month.from(fmt.withLocale(loc).parse(input));
        break; // found, no need to parse in other locales
    } catch (DateTimeParseException e) {
        // can't parse, go to next locale
    }
}
if (month != null) {
    System.out.println(month.getValue()); // 3
}

In the code above, the month will be Month.MARCH and the output will be 3.

  • 5
    And if you need to do it a lot, you could instead format the month name for every `Locale`, and build a `Map`. You could use a `TreeMap` with a `Collator` that ignores case and accents. – Andreas Sep 16 '17 at 22:57
  • Care to share that code Andreas? – Jacques Koorts Jul 13 '22 at 12:37