4
    String s = "test -||- testing again -|- test_1 -||- testing again_1";
    StringTokenizer tokenizer = new StringTokenizer(s,"-|-");
    System.out.println(tokenizer.countTokens());

    while(tokenizer.hasMoreTokens()) {
        System.out.println(tokenizer.nextToken());
    }

Output:

4
test 
 testing again 
 test_1 
 testing again_1

Shouldn't the count be 2..?

And i tried printing the tokens, and all the strings got printed. Not only that which should be considered as a token.

I also read from the java API doc the following,

delimiter characters serve to separate tokens. A token is a maximal sequence of consecutive characters that are not delimiters

if such is the case shouldn't my delimeter "-|-" be used to split the strings into 2?

Thirumalai Parthasarathi
  • 4,541
  • 1
  • 25
  • 43
  • did you print the tokens? – vefthym Nov 04 '14 at 14:05
  • @vefthym yes i did.. Please check my edit. – Thirumalai Parthasarathi Nov 04 '14 at 14:09
  • Possible duplicate of this: http://stackoverflow.com/questions/13066929/string-tokenizer-delimiter – Raibaz Nov 04 '14 at 14:10
  • 1
    Why should it print 2 instead of 4 if there are 4 tokens (`1||2|3||4`)? Note that double pipes (`||`) are considered to be separators with an empty (and thus ignored) token in between. – Thomas Nov 04 '14 at 14:10
  • The [javadoc](http://docs.oracle.com/javase/7/docs/api/java/util/StringTokenizer.html) itself says `StringTokenizer is a legacy class that is retained for compatibility reasons although its use is discouraged in new code. It is recommended that anyone seeking this functionality use the split method of String or the java.util.regex package instead.`. – sp00m Nov 04 '14 at 14:12
  • @sp00m: i understand that this is a legacy class. But what intrigues is me is that, why is this scenario not producing the desired results..? – Thirumalai Parthasarathi Nov 04 '14 at 14:18
  • @Thomas: could you look at my edit.? – Thirumalai Parthasarathi Nov 04 '14 at 14:18
  • 3
    @BlackPanther look at Seelenvirtuose's answer: StringTokenizer uses individual _characters_ as delimiters, your quote states that the tokens _in between_ might be a longer sequence of characters, but delimiters are any sequence of the characters passed in the constructor, i.e. `-|-` will result in the delimiter matching any sequence of minuses and pipes, e.g. `|---|--|||-` would also be a valid delimiter. To split at whole strings you need to use `String#split()` etc. – Thomas Nov 04 '14 at 14:22

1 Answers1

6

A StringTokenizer uses a set of delimiter characters, not a delimiting string as you obviously assume.

So it takes all occurrences of any of your delimiting characters and tokenizes around them. This results in the four tokens you got (empty tokens are omitted).

If you want to split the string by a delimiting string, you must use String.split which takes a regular expression:

String s = "test -||- testing again -|- test_1 -||- testing again_1";
String[] split = s.split("-\\|-"); // "|" is a special char in regex
System.out.println(split.length);

Output is "2".

Seelenvirtuose
  • 20,273
  • 6
  • 37
  • 66
  • I am sure split will work.. But why is tokenizer not working..? could you elaborate a little bit..? – Thirumalai Parthasarathi Nov 04 '14 at 14:19
  • @BlackPanther reread the first sentence: "A StringTokenizer uses a set of _delimiter characters_" - that means you're basically defining a regex like `[-\|]+` if you pass `-|-` as the delimiter (although no regex is used but the effect would be roughly the same). – Thomas Nov 04 '14 at 14:25
  • @BlackPanther A string tokenizer uses _each single character_ in the delimiter set for tokenizing. In your example it uses '-' and '|' for tokenizing (the second '-' is thus superfluous). A split uses the _whole regular expression_ for splitting. – Seelenvirtuose Nov 04 '14 at 14:25
  • @BlackPanther If I remember correctly second argument of StringTokenizer `"-|-"` is not delimiter itself, but sets of characters which should be used as delimiters, so it is same as `"-|"` which means you are splitting on `-` or `|`. – Pshemo Nov 04 '14 at 14:25