2

I got a string with the following format : yyyyMMdd_HHmm_ss_unitCode_(status). I need to map each component to a property of a dedicated class.

I thought of defining my token with a regular expression like this : {d+}4{d+}2{d+}2_{d+}2_{d+}2_{s+}3_{s+}2 => Apologize for the approximate regex syntax, d is for decimal and s for string.

How can I tell my parser that the first group {d+}4 must go in the "year" property of my class, the second to the "month" and so forth.

Obviously, I could just do this : token.setYear(substring(0,4)) but I wanted to be a little more generic since I do not have control over the structure of the filename. I also considered defining an xml structure with startPosition, endPosition, attribute name to store and type.

All in all, I thought all of this much too complicated. The problem is that I do not have a single separator to enable me to use a String.split.

Mike Samuel
  • 118,113
  • 30
  • 216
  • 245
charly's
  • 279
  • 1
  • 3
  • 16
  • 1
    You have to parse a string, you plan to use a regex, but you don't know the structure of the string? This is really confusing. If you don't know the structure of the string, I don't see how you could parse it. – JB Nizet Jan 12 '12 at 17:21
  • 1
    Good question. One thing I don't get: "I do not have control over the structure of the filename". So the string is a filename? – calebds Jan 12 '12 at 17:29
  • I know that the structure can change, that is why I want to define it via a regex to be able to adapt my code as the structure changes. – charly's Jan 12 '12 at 18:02
  • and yes the string is a filename :) – charly's Jan 12 '12 at 18:02

1 Answers1

0
String input; // yyyyMMdd_HHmm_ss_unitCode_(status)
SpecialClass output;

String regex = "(\\d{4})(\\d{2})(\\d{2})_(\\d{2})(\\d{2})_(\\d{2})_([^_]+)_\\((.+)\\)";
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(input);

if (m.matches())
{
    output.year = m.group(1);
    output.month = m.group(2);
    // etc
}

Example input:

String input = "20120113_1234_27_500_(33)";

Will produce the following groups:

Group 1: 2012 //year
Group 2: 01   //month
Group 3: 13   //day
Group 4: 12   //hour
Group 5: 34   //minute
Group 6: 27   //second
Group 7: 500  //unitcode
Group 8: 33   //status

Test program: http://pastebin.com/upC5R9rP

theglauber
  • 28,367
  • 7
  • 29
  • 47
  • Tweak your regexp as appropriate, depending on how many variability you want to allow. – theglauber Jan 12 '12 at 17:29
  • I think all the `\d`'s need to be `\\d` as well as the `\(` and what is the `([^)+)` at the end supposed to do? – Mike Samuel Jan 12 '12 at 19:44
  • Ypu're correct... the \\s needed to be doubled. The thing at the end is for matching characters up to but not including the closing ")". It was missing a closing "]". – theglauber Jan 12 '12 at 21:53
  • I had to make a few more fixes - as far as i can tell, the regexp is correct now. Here's a test program: http://pastebin.com/upC5R9rP – theglauber Jan 13 '12 at 19:07