2

I am looking for a GWT compatible replacement for a StringTokenzier which includes the delimiter. The task cannot be solved by regular expressions because the grammar is not context free.

Example: Extract the first level of a generic type definition. So for List<String>, Map<Integer, Map<Character, Boolean>>, Set<List<Double>>, I want a list with three items. List<String> and Map<Integer, Map<Character, Boolean>> and Set<List<Double>>

Stripped down example code:

private static List<String> extractFirstLevel(String type) {
    List<String> res = new LinkedList<String>();
    StringTokenizer st = new StringTokenizer(type, "<>,", true);
    int nesting = 0;        // we are only interested in nesting 0
    String lastToken = "";
    while (st.hasMoreTokens()) {
        String token = st.nextToken();
        if (token.equals("<")) {
            nesting++;  // ignore till matching >, but keep track of additional <
            lastToken = lastToken + "<";
        } else if (token.equals(">")) {
            nesting--;  // up one level
            lastToken = lastToken + ">";
        } else if (token.equals(",")) {
            if (nesting == 0) {  // we are interested in the top level
                res.add(lastToken);
                lastToken = "";
            } else { // this is a , inside a < >, so we are not interested
                lastToken = lastToken + ", ";
            }
        } else {
            lastToken = lastToken + token.trim();
        }
    }
    res.add(lastToken);
    return res;
}
  • You could use String.split() method. Usually that is a very good alternative for StringTokenizer but I don't know about the part where you get the tokenizer to return the delimiters. – george_h May 27 '12 at 21:02
  • I don't think split() will get me anywhere because it is designed for regular expressions only. – Wassersturm May 28 '12 at 08:35
  • True, though I meant you can split by using "<>," as your regex. It will work, but in your tokenizer you want it to return the delimiters right? if that is 100% needed then your stuck with StringTokenizer and no alternative exists (that I know of). – george_h May 28 '12 at 13:54

1 Answers1

2

I ended up iterating over the characters of the string:

private static List<String> extractFirstLevelNew(String type) {
    List<String> res = new LinkedList<String>();
    int start = 0;
    int nesting = 0;
    for (int i = 0; i < type.length(); i++) {
        char chr = type.charAt(i);
        if (chr == '<') {
            nesting++;
        } else if (chr == '>') {
            nesting--;
        } else if ((chr == ',') && (nesting == 0)) {
            res.add(type.substring(start, i).trim());
            start = i + 1;
        }
    }
    res.add(type.substring(start, type.length()).trim());
    return res;
}