Get Position in Original String from `StringTokenizer`

Question

I need to get the space-separated tokens in a string, but I also need to know the character position within the original string at which each token starts. Is there any way to do this with StringTokenizer. Also, as I understand it, this is a legacy class; is there a better alternative to using StringTokenizer.

Most would use `String#split(...)` instead of StringTokenizer. I think that the API even states this. **Edit:** and in fact the API does state this: `"StringTokenizer is a legacy class that is retained for compatibility reasons although its use is discouraged in new code. It is recommended that anyone seeking this functionality use the split method of String or the java.util.regex package instead. "` — Hovercraft Full Of Eels, Dec 23 '12 at 11:20

Rohit Jain · Accepted Answer · 2012-12-23T11:40:40.967

You should always use String#split() to split your string rather than StringTokenizer.

However, since you also want the position of the tokens in your string, then it would be better to use Pattern and Matcher class. You have got Matcher#start() method which gives the position of the string matching the pattern.

Here's an example: -

String str = "abc asf basdfasf asf";
Matcher matcher = Pattern.compile("\\S+").matcher(str);

while (matcher.find()) {
    System.out.println(matcher.start() + ":" + matcher.group());
}

The pattern \\S+ matches the non-space characters from that string. Using Matcher#find() methods returns all the matched substring.

I picked this answer because it also handles cases where there may be multiple spaces in between words. — Paul Manta, Dec 23 '12 at 11:37

score 1 · Answer 2 · answered Dec 23 '12 at 11:28

1

You can easily do this yourself using String.split()

 String text = "hello world example";
 int tokenStartIndex = 0;
 for (String token : text.split(" ")) {      
   System.out.println("token: " + token + ", tokenStartIndex: " + tokenStartIndex);
   tokenStartIndex += token.length() + 1; // +1 because of whitespace
 }

this prints:

token: hello, tokenStartIndex: 0
token: world, tokenStartIndex: 6
token: example, tokenStartIndex: 12

answered Dec 23 '12 at 11:28

micha

47,774
16
73
80

1

What if there are two subsequent spaces in the String? – Adriaan Koster Dec 01 '15 at 11:36

score 0 · Answer 3 · answered Apr 05 '16 at 13:06

0

I improved micha's answer, so that it can handle neighboring spaces:

String text = "hello  world     example";
int start = 0;
for (String token : text.split("[\u00A0 \n]")) {
    if (token.length() > 0) {
        start = text.indexOf(token, start);
        System.out.println("token: " + token + ", start at: " + start);
    }
}

Output is:

token: hello, start at: 0
token: world, start at: 7
token: example, start at: 17

answered Apr 05 '16 at 13:06

juice

587
1
8
15

what if there are multiple tokens with the same value? indexof will incorrectly give the index of the first occurrence. – Paul Jul 04 '16 at 20:56
@Paul No. That's why there is `indexOf` with two arguments. Did you try it at least? – juice Jul 06 '16 at 07:52

Get Position in Original String from `StringTokenizer`

3 Answers3

Linked