2

May be vaguely related to question 3100585.

The following class' purpose is to take a String containing a line of Java sourcecode and divide it into token strings that will be further parsed by a separate class. The regular expression shown in the split method divides the string by operator characters and whitespace, retaining all characters, then the class iterates through the resulting array and removes any whitespace and end-of-line characters. It then converts the array into an ArrayList and returns it.

public class Lexer {

Lexer() {
}

public List<String> convertStringToTokens(String input) {
    input = input.trim();

    String[] result = input.split("(?<=[-+*\/=\s\<\>\(\)])|(?=[-+*\/=\s\<\>\(\)])");
    List<String> resultList = new LinkedList<>(Arrays.asList(result));

    for (Iterator<String> iterator = resultList.iterator(); iterator.hasNext();) {
        String string = iterator.next();
        if (string.isEmpty() || string.matches("\\u000A") ||  string.matches("\\u000D") || string.matches(" ") || string.matches("\\u000B")) {

                iterator.remove();
            }
        }

        return resultList;
    }
}

Unfortunately, the class does not perform the intended role, the reasons to which I am unsure of.

Most likely the Regular Expression is at fault here.

If anyone knows where I went wrong on this, please notify and advise.

Edit: Input is a single string such as "Sphere s = new Sphere(16);". Output is an ArrayList of Strings, (at most) two Strings in length, which for the above input would be

{"Sphere s = new Sphere(16",");"}.

(The separation of the closing parenthesis form the parameter is intended. Incidentally, would someone know how to separate the parameter from the opening parenthesis as well?)

1 Answers1

0

I found out a solution: simply moving the space match from the lookaround assertions (added after the question was asked) to a separate alternative match allowed me to remove space characters and split the string around them as well.

String[] result = input.split("(?<=[ -+*\/=\s\<\>\(\)])|(?=[ -+*\/=\s\<\>\(\)])");

becomes

String[] result = input.split("(?<=[-+*\/=\s\<\>\(\)])|(?=[-+*\/=\s\<\>\(\)])| ");