2

I am trying to parse a string recursively with StringTokenizer. The string represents a tree, in the form:

[(0,1),[(00,01,02),[()],[()]]]

where information of the node is stored inside the parenthesis, while the brackets are the children of a node, separated by commas. For instance, this string represents this tree:

tree

If a node has something inside the parenthesis it is a normal node, if it has nothing it is a leaf.

I've written the code below to parse it, and it works fine but when the recursion ends it seems the tokenizer doesn't have any other token to analyze. The problem is that when it encounters the final brackets (]]]) it jumps directly to the last one skipping the others.

import java.util.*;

public class ParseString
{

public void setParameters(String parameters) throws Exception {
    setParameters(new StringTokenizer(parameters, "[(,)]", true));

}

public void setParameters(StringTokenizer tokenizer) throws Exception{

    String buf;
    try{
      if (!(buf = tokenizer.nextToken()).equals("["))
        throw new Exception("Malformed string, found " + buf + "instead of [");
      boolean isLeaf = setWeights(tokenizer);
      System.out.println("Leaf: " + isLeaf);
      while (!(buf = tokenizer.nextToken()).equals("]")) {
        do{
           setParameters(tokenizer);
        }while (!(tokenizer.nextToken().equals("]")));
        if (!(buf = tokenizer.nextToken()).equals(","))
           break;
      } 
    }catch(Exception e){e.printStackTrace();}
   }


    public boolean setWeights(StringTokenizer tokenizer) throws 
 Exception{
        String buf;
        if(!(buf = tokenizer.nextToken()).equals("("))
        throw new Exception("Malformed string, found "+ buf + "instead of ("); 
    do{
        buf = tokenizer.nextToken();
        if(buf.equals(")")){
        return true;
    }
    if(!buf.equals(","))
        System.out.println(buf);
    }while(!tokenizer.nextToken().equals(")"));
    return false;
   }


   public static void main(String[] args)
   {
     ParseString ps = new ParseString();    
     try{
        ps.setParameters("[(0,1),[(00,01,02),[()],[()]]]");
     }catch(Exception e){e.printStackTrace();}
   }
 }

This is the output I have running it:

 0
 1
 Leaf: false
 00
 01
 02
 Leaf: false
 Leaf: true
 Leaf: true
 java.util.NoSuchElementException
    at java.util.StringTokenizer.nextToken(StringTokenizer.java:349)
    at ParseString.setParameters(ParseString.java:22)
    at ParseString.setParameters(ParseString.java:7)
    at ParseString.main(ParseString.java:51)

Another thing: the parser should be able to analyze any generic tree, not just this one. If someone can fix this, I'll be glad.

GianniPele
  • 41
  • 1
  • 1
  • 7

2 Answers2

1

I think you might consume ] twice in the nested loops in some cases, potentially consuming the closing bracket of the parent.

I'd just make the structure more obvious, perhaps in the following way:

// Precondition: '[' expected
// Postcondition: Matching ']' consumed
void parseNode(StringTokenizer st) {
  if (!st.nextToken().equals("[")) {
    throw new RuntimeException("[ expected parsing node.");
  }
  boolean leaf = parseWeights(st);
  System.out.println("isleaf: " + leaf);

  // Behind ')': Parse children if any.

  String token = st.nextToken();
  while (token.equals(",")) {
    parseNode(st);
    token = st.nextToken();
  }
  if (!token.equals("]")) {
    throw new RuntimeException("] expected.");
  }
}

// Precondition: '(' expected
// Postcondition: Matching ')' consumed
boolean parseWeights(StringTokenizer st) {
  if (!st.nextToken().equals("(")) {
    throw new RuntimeException("( expected parsing node weights.");
  }
  String token = st.nextToken();
  if (token.equals(")") {
    return true;
  }
  while(true) {
    System.out.println(token);
    token = st.nextToken();
    if (token.equals(")") {
      break;
    }
    if (!token.equals(",") {
      throw new RuntimeException(", or ) expected parsing weights.");
    }
    token = st.nextToken();
  }
  return false;
}   
Stefan Haustein
  • 18,427
  • 3
  • 36
  • 51
0

You're calling tokenizer.nextToken() without checking if a next token is available (this can be checked by calling tokenizer.hasMoreTokens()). You should check if first, and if hasMoreTokens() returns false, just exit from the method by calling return;.

But IMO it's better to put all the tokens in a list first, then you can iterate through it in an easier way:

String s = "[(0,1),[(00,01,02),[()],[()]]]";
StringTokenizer strtok = new StringTokenizer(s, "[(,)]", true);
// put tokens in a list
List<String> list = new ArrayList<>();
while (strtok.hasMoreTokens()) {
    list.add(strtok.nextToken());
}
// parse it, starting at position 0
parse(list, 0);

// parse method
public void parse(List<String> list, int position) {
    if (position > list.size() - 1) {
        // no more elements, stop
        return;
    }

    String element = list.get(position);
    if (")".equals(element)) { // end of node
        // is leaf if previous element was the matching "("
        System.out.println("Leaf:" + "(".equals(list.get(position - 1)));
    } else if (!("[".equals(element) || "(".equals(element) || "]".equals(element) || ",".equals(element))) {
        // print only contents of a node (ignoring delimiters)
        System.out.println(element);
    }

    // parse next element
    parse(list, position + 1);
}

The output is:

0
1
Leaf:false
00
01
02
Leaf:false
Leaf:true
Leaf:true

If you want a nested/idented output, you can add a level variable to the parse method:

public void parse(List<String> list, int position, int level) {
    if (position > list.size() - 1) {
        return;
    }
    String element = list.get(position);
    int nextLevel = level;

    if ("[".equals(element)) {
        nextLevel++;
    } else if ("]".equals(element)) {
        nextLevel--;
    } else if (")".equals(element)) {
        for (int i = 0; i < nextLevel; i++) {
            System.out.print("  ");
        }
        System.out.println("Leaf:" + "(".equals(list.get(position - 1)));
    } else if (!("(".equals(element) || "]".equals(element) || ",".equals(element))) {
        for (int i = 0; i < nextLevel; i++) {
            System.out.print("  ");
        }
        System.out.println(element);
    }

    parse(list, position + 1, nextLevel);
}

Then, if I call (using the same list as above):

// starting at position zero and level zero
parse(list, 0, 0);

The output will be:

  0
  1
  Leaf:false
    00
    01
    02
    Leaf:false
      Leaf:true
      Leaf:true

All elements in the same level will have the same identation.

  • 1
    Thank you for the answer, but I am quite forced to maintain the structure as above. The reason is that I have to do many more things then just extract the weights and show them. But your solution is very good! – GianniPele Jul 02 '17 at 12:44