5

I need to be able to split an input String by commas, semi-colons or white-space (or a mix of the three). I would also like to treat multiple consecutive delimiters in the input as a single delimiter. Here's what I have so far:

String regex = "[,;\\s]+";    
return input.split(regex);

This works, except for when the input string starts with one of the delimiter characters, in which case the first element of the result array is an empty String. I do not want my result to have empty Strings, so that something like, ",,,,ZERO; , ;;ONE ,TWO;," returns just a three element array containing the capitalized Strings.

Is there a better way to do this than stripping out any leading characters that match my reg-ex prior to invoking String.split?

Thanks in advance!

AndreiM
  • 4,558
  • 4
  • 36
  • 50
  • Not posting as an answer as I don't remember the Java regex API, but you could simply search for strings of non-delimiters instead of splitting on delimiters, e.g. using a regex like `[^,;\s]+`. – Max Shawabkeh Apr 28 '10 at 19:21
  • Apparently identical question, newer but with better accepted answer: https://stackoverflow.com/questions/9389503/how-to-prevent-java-lang-string-split-from-creating-a-leading-empty-string – Nicolas Raoul Mar 29 '17 at 06:37

4 Answers4

7

No, there isn't. You can only ignore trailing delimiters by providing 0 as a second parameter to String's split() method:

return input.split(regex, 0);

but for leading delimiters, you'll have to strip them first:

return input.replaceFirst("^"+regex, "").split(regex, 0);
Bart Kiers
  • 166,582
  • 36
  • 299
  • 288
  • A negative parameter? `If n is zero then the pattern will be applied as many times as possible, the array can have any length, and trailing empty strings will be discarded.` From http://java.sun.com/javase/6/docs/api/java/lang/String.html#split%28java.lang.String,%20int%29 – Mark Byers Apr 28 '10 at 19:24
  • Whoops, yes, I meant 0. Thanks! – Bart Kiers Apr 28 '10 at 19:26
3

If by "better" you mean higher performance then you might want to try creating a regular expression that matches what you want to match and using Matcher.find in a loop and pulling out the matches as you find them. This saves modifying the string first. But measure it for yourself to see which is faster for your data.

If by "better" you mean simpler, then no I don't think there is a simpler way than the way you suggested: removing the leading separators before applying the split.

Mark Byers
  • 811,555
  • 193
  • 1,581
  • 1,452
2

Pretty much all splitting facilities built into the JDK are broken one way or another. You'd be better off using a third-party class such as Splitter, which is both flexible and correct in how it handles empty tokens and whitespaces:

Splitter.on(CharMatcher.anyOf(";,").or(CharMatcher.WHITESPACE))
    .omitEmptyStrings()
    .split(",,,ZERO;,ONE TWO");

will yield an Iterable<String> containing "ZERO", "ONE", "TWO"

Julien Silland
  • 1,190
  • 10
  • 11
1

You could also potentially use StringTokenizer to build the list, depending what you need to do with it:

StringTokenizer st = new StringTokenizer(",,,ZERO;,ONE TWO", ",; ", false);
while(st.hasMoreTokens()) {
  String str = st.nextToken();
  //add to list, process, etc...
}

As a caveat, however, you'll need to define each potential whitespace character separately in the second argument to the constructor.

mtruesdell
  • 3,697
  • 3
  • 21
  • 20