0

Suppose I wanted to tokenize a text where everything other than [a-zA-Z] are set as delimiters how do I write the sringtokenizer in Java? Would it look something like this: StringTokenizer st = new StringTokenizer(data, "[[^a-z]&&[^A-Z]");?

Pshemo
  • 122,468
  • 25
  • 185
  • 269
Chris Olszewski
  • 153
  • 2
  • 8

2 Answers2

2

Try regexp [^a-zA-Z]+

String text = "hello, world^ i love: #66 you";
for (String str : text.split("[^a-zA-Z]+")) {
    System.out.println(str);
}
mishadoff
  • 10,719
  • 2
  • 33
  • 55
0

Use negative lookahead based regex like this:

String[] arr = data.split("(?i)(?![a-z]).");

?i - ignore case
?! - Negative lookahead

What it means is that delimit on any character other a-z or A-Z.

anubhava
  • 761,203
  • 64
  • 569
  • 643
  • ?!(x) means not followed by x, exactly as the name negative lookahead implies – Benjamin Gruenbaum Jan 17 '13 at 17:59
  • StringTokenizer st = new StringTokenizer(data, "(?i)(?![a-z]).") – Chris Olszewski Jan 17 '13 at 18:19
  • ACTUALLY it treats 'a' as a delimiter – Chris Olszewski Jan 17 '13 at 18:23
  • From what I see second argument of `StringTokenizer` is not regex but delimiter so every character this String will be treated as delimiter. If you want to use regex use `Scanner` class; `split(regex)` method from String class, or `Pattern` and `Matcher` classes from java.util.regex. – Pshemo Jan 17 '13 at 18:25
  • i just want stringtokenizer to treat everything other than [a-zA-Z] as delimiter – Chris Olszewski Jan 17 '13 at 18:27
  • @ChrisOlszewski: From Java doc: `StringTokenizer is a legacy class that is retained for compatibility reasons although its use is discouraged in new code. It is recommended that anyone seeking this functionality use the split method of String or the java.util.regex package instead`. So better to use String#split method. – anubhava Jan 17 '13 at 18:38
  • 1
    @anubhava sorry for that off-top question but I see you (and lot other people) use `String#split` form instead `String.split`. Is there any difference between these forms? Is it that `#` says that method is not static and should be invoked on object, where `.` describes static methods? I tried to goolge it but without success. – Pshemo Jan 17 '13 at 19:10
  • @Pshemo: Yes you guessed right, usually `class.methodname` is used for static references but `class#method` for non-static methods. – anubhava Jan 17 '13 at 20:10
  • @anubhava Thank you. Everyday is good day to learn something new :) – Pshemo Jan 17 '13 at 20:12