2

Consider a string like below with delimiter __|__.

String str = "a_b__|__c_d";

str.split("__\\|__") gives 2 splits a_b and c_d StringUtils.split(str, "__|__") or StringUtils.split(str, "__\\|__") gives 4 splits a, b, c, d which is not desired.

Is there any way to make StringUtils.split() to give same results String.split()?

user1013528
  • 93
  • 1
  • 2
  • 9

2 Answers2

7

String.split() has some very surprising semantics, and it's rarely what you want. You should prefer StringUtils (or Guava's Splitter, discussed in the previous link).

Your specific issue is that String.split() takes a regular expression, while StringUtils.split() uses each character as a separate token. You should use StringUtils.splitByWholeSeparator() to split on the contents of the full string.

StringUtils.splitByWholeSeparator(str, "__|__");
dimo414
  • 47,227
  • 18
  • 148
  • 244
  • 2
    I recommend StringUtils.splitByWholeSeparatorPreserveAllTokens("-a--b-", "-") -> ["", "a", "", "b", ""] which matches PHP's explode("-", "-a--b-") -> ["", "a", "", "b", ""]. StringUtils.splitByWholeSeparator("-a--b-", "-") returns ["a", "b", ""] which I find unexpected. – Zack Morris Feb 05 '18 at 01:33
  • Matching PHP semantics is, generally speaking, probably an anti-goal ;) – dimo414 Feb 05 '18 at 06:04
1

No, as per documentation, second parameter of StringUtils.split is the list of all characters that are considered splitters. There is a different function in Apache Commons which does what you want - StringUtils.splitByWholeSeparator. Still, I don't get what's wrong with simple String.split.

yeputons
  • 8,478
  • 34
  • 67
  • Thanks. I didn't realize such method exists. Well I can use String.split() but I prefer StringUtils. splitByWholeSeparator() as it takes of null strings also. – user1013528 Jun 06 '17 at 23:32