1

I have the following question about Java Regular expression.

When I am defining a regular expression using pattern:

String pattern = "(\\d{4})\\d{2}\\d{2}";

and the input string is "20180808", I can get the group(0) - 20180808
but

group(1) - not match
group (2) - 08
group (3) - 08,

I am sure the regular expression can be effective in other languages, like Python, C#.

Can anyone Help? thanks for your expert solution.

@Test
public void testParseDateStringToMinimumOfTheDate() {
    try {
        UtilsFactory utilsFactory = UtilsFactory.getInstance();
        DateUtils dateUtils = utilsFactory.getInstanceOfDateUtils();
        CalendarUtils calendarUtils = utilsFactory.getInstanceOfCalendarUtils();
        calendarUtils.parseDateStringToMinimumOfTheDate("20180808");
    } catch (Exception e) {
        e.printStackTrace();
    }
} 

    public Calendar parseDateStringToMinimumOfTheDate(String dateString_yyyyMMdd) throws Exception {
    Calendar cal = null;
    String pattern = "(\\d{4})\\d{2}\\d{2}";
    try {
        cal = getMaxUtcCalendarToday();
        List<String> matchStringList = regMatch(dateString_yyyyMMdd, pattern);
        for (int i = 0; i < matchStringList.size(); i++) {

        }
    } catch (Exception e) {
        logger.error(getClassName() + ".parseDateStringToBeginningOfTheDate()- dateString_yyyyMMdd="
                + dateString_yyyyMMdd, e);
        throw e;
    }
    return cal;
}

private List<String> regMatch(String sourceString, String patternString) throws Exception {
    List<String> matchStrList = null;
    Pattern pattern = null;
    Matcher matcher = null;
    try {
        matchStrList = new ArrayList<String>();
        pattern = Pattern.compile(patternString, Pattern.CASE_INSENSITIVE);
        matcher = pattern.matcher(sourceString);
        while (matcher.find()) {
            matchStrList.add(matcher.group());
        }
    } catch (Exception e) {
        logger.error(
                getClassName() + ".regMatch() - sourceString=" + sourceString + ",patternString=" + patternString,
                e);
        throw e;
    }
    return matchStrList;
}
Ole V.V.
  • 81,772
  • 15
  • 137
  • 161
manpakhong
  • 11
  • 3
  • Sorry, a correction: my input of pattern is : String pattern = "(\\d{4})(\\d{2})(\\d{2})"; And the result: group(0) = 20180808 group(1) = – manpakhong Sep 02 '18 at 11:00
  • 1
    Hei @manpakhong, you can also edit your question for clairification. The regex is fine, i tested it on [regex101.com](https://regex101.com). – Emaro Sep 02 '18 at 11:03
  • 1
    This is a bit offtopic but I would use [SimpleDateFormat](https://www.google.com/search?q=java%20simpledateformat) for parsing a date string. `String dateStr = "20180808"; DateFormat dateFormat = new SimpleDateFormat("yyyyMMdd"); Calendar cal = Calendar.getInstance(); cal.setTime(dateFormat.parse(dateStr));` – haba713 Sep 02 '18 at 11:18
  • 3
    You don't seem to have asked a question in your post. What is the problem that you are having? – Sweeper Sep 02 '18 at 11:27
  • 1
    @haba713 Please don’t teach the young ones to use the long outdated and notoriously troublesome `SimpleDateFormat` class. At least not as the first option. And not without any reservation. Today we have so much better in [`java.time`, the modern Java date and time API](https://docs.oracle.com/javase/tutorial/datetime/) and its `DateTimeFormatter`. – Ole V.V. Sep 02 '18 at 12:41
  • @manpakhong Could you provide [a Minimal, Complete, and Verifiable example](https://stackoverflow.com/help/mcve) that doesn’t use classes that we don’t have on our computers, please? I’d really like to run your code to see what happens, and how I might change it to answer your question, but I cannot. – Ole V.V. Sep 02 '18 at 12:44
  • 1
    @Ole V.V., @manpakhong seems to want to retrieve `Calendar` for a date string. If getting year, month and day of month is enough, try this: `String dateStr = "20180808"; DateTimeFormatter formatter = DateTimeFormatter.ofPattern("yyyyMMdd"); LocalDate date = LocalDate.parse(dateStr, formatter); int year = date.getYear(); int month = date.getMonthValue(); int day = date.getDayOfMonth();` – haba713 Sep 02 '18 at 12:54
  • 1
    @haba713 IMHO wanting a `Calendar` is no excuse for using `SimpleDateFormat` (even though those classes are friends). My answer shows how to do instead (leaving aside that ideally you should not want a `Calendar`either). – Ole V.V. Sep 02 '18 at 13:18

2 Answers2

2
    Pattern pattern = Pattern.compile("\\d{8}");
    String sourceString = "20180808";
    Matcher matcher = pattern.matcher(sourceString);
    while (matcher.find()) {
        LocalDate date = LocalDate.parse(matcher.group(), DateTimeFormatter.BASIC_ISO_DATE);
        System.out.println(date);
    }

The output from this snippet is the expected date:

2018-08-08

If your string may contain more text than just the 8 digit date, it is correct to use a regular expression for taking out those 8 digits. The correct class to use for a date is LocalDate from java.time, the modern Java date and time API. it’s a date in the ISO calendar system without time of day and without time zone. The Calendar, by contrast, represents date and time with time zone in some calendar system. It’s much more than you need. Also the Calendar class is long outdated and was replaced by java.time four and a half years ago because it was poorly designed.

If you do need a Calendar object for some legacy API that you cannot change or don’t want to change just now, convert like this:

        ZoneId zone = ZoneId.of("America/Punta_Arenas");
        ZonedDateTime startOfDay = date.atStartOfDay(zone);
        Calendar cal = GregorianCalendar.from(startOfDay);

Please substitute the correct time zone if it didn’t happen to be America/Punta_Arenas.

What went wrong in your code?

There’s nothing wrong with your code except that it’s overly complicated and uses the outdated date and time classes.

    String patternString = "(\\d{4})(\\d{2})(\\d{2})";
    Pattern pattern = null;
    Matcher matcher = null;
    try {
        pattern = Pattern.compile(patternString, Pattern.CASE_INSENSITIVE);
        matcher = pattern.matcher(sourceString);
        while (matcher.find()) {
            System.out.println("group(1): " + matcher.group(1));
            System.out.println("group(2): " + matcher.group(2));
            System.out.println("group(3): " + matcher.group(3));
        }
    } catch (Exception e) {
        // TODO handle exception
        throw e;
    }

The output from this snippet was:

group(1): 2018
group(2): 08
group(3): 08

Link

Oracle tutorial: Date Time explaining how to use java.time.

Ole V.V.
  • 81,772
  • 15
  • 137
  • 161
0

There is nothing wrong with your regex (As you mentioned (\d{4})(\d{2})(\d{2}). What you are doing wrong is, you are not grabbing the captured group properly. Refactor your method to this

private static List<String> regMatch(String sourceString, String patternString) {
      List<String> matchStrList = new ArrayList<>();

      Pattern pattern = Pattern.compile(patternString, Pattern.CASE_INSENSITIVE);
      Matcher matcher = pattern.matcher(sourceString);

      if(matcher.find()) {
          for(int i = 1; i <= matcher.groupCount(); i++) {
            matchStrList.add(matcher.group(i));
          }
      }

   return matchStrList;
}

You might wonder where is the group 0. Patter API captured groups are numbered by counting their opening parentheses from left to right and the first group is always the whole regular expression. So for the String (A)(B(C)) You will get the groups as below

Group 0: (A)(B(C))
Group 1: (A)
Group 2: (B(C))
Group 3: (C)

And the groupCount() method returns the number of capturing groups present in the matcher's pattern.

Side Note

As @haba713 mentioned in the comment, you might don't want to do all these regex hassle just to parse a Date. You can simply use SimpleDateFormat for this.

SimpleDateFormat formater = new SimpleDateFormat("yyyyMMdd");
System.out.println(formater.parse(dateString));
Shafin Mahmud
  • 3,831
  • 1
  • 23
  • 35
  • Oh no. It’s correct that you should not want the regex hassle and should prefer a library class for parsing your date string. However, you should also want to avoid the long outdated and notoriously troublesome `SimpleDateFormat` class. And no matter if you use that one or the modern replacement, the `DateTimeFormatter` class, format pattern strings are case sensitive, and `YYYYMMDD` has the wrong case for a couple of letters and will give exceptions or incorrect results in most cases. – Ole V.V. Sep 02 '18 at 12:39
  • Thanks, That is! I was confused by the debugger, And your supplementary note is great! – manpakhong Sep 02 '18 at 13:37
  • @OleV.V. The question was about capturing regex sub group from text. Solving this with Java Date parsers is different issue and approach. Its always better and can be tried in many ways. But I pointed what exactly he was doing wrong in his code. – Shafin Mahmud Sep 02 '18 at 14:17
  • Sorry that I was unclear, @ShafinMahmud. Your comment is to the point. My comment was only concerned with the last four lines of your answer, which I should of course have made clear. – Ole V.V. Sep 02 '18 at 14:33