String class has some methods that i cannot understand why they were implemented like this... replace is one of them.
public String replace(CharSequence target, CharSequence replacement) {
return Pattern.compile(target.toString(), Pattern.LITERAL).matcher(
this).replaceAll(Matcher.quoteReplacement(replacement.toString()));
}
Are there some significant advantages over a simpler and more efficient (fast!) method?
public static String replace(String string, String searchFor, String replaceWith) {
StringBuilder result=new StringBuilder();
int index=0;
int beginIndex=0;
while((index=string.indexOf(searchFor, index))!=-1){
result.append(string.substring(beginIndex, index)+replaceWith);
index+=searchFor.length();
beginIndex=index;
}
result.append(string.substring(beginIndex, string.length()));
return result.toString();
}
Stats with Java 7:
1,000,000 iterations
replace "b" with "x" in "a.b.c"
result: "a.x.c"
Times:
string.replace: 485ms
string.replaceAll: 490ms
optimized replace = 180ms
Code like the Java 7 split method is heavily optimized to avoid pattern compile / regex processing when possible:
public String[] split(String regex, int limit) {
/* fastpath if the regex is a
(1)one-char String and this character is not one of the
RegEx's meta characters ".$|()[{^?*+\\", or
(2)two-char String and the first char is the backslash and
the second is not the ascii digit or ascii letter.
*/
char ch = 0;
if (((regex.value.length == 1 &&
".$|()[{^?*+\\".indexOf(ch = regex.charAt(0)) == -1) ||
(regex.length() == 2 &&
regex.charAt(0) == '\\' &&
(((ch = regex.charAt(1))-'0')|('9'-ch)) < 0 &&
((ch-'a')|('z'-ch)) < 0 &&
((ch-'A')|('Z'-ch)) < 0)) &&
(ch < Character.MIN_HIGH_SURROGATE ||
ch > Character.MAX_LOW_SURROGATE))
{
int off = 0;
int next = 0;
boolean limited = limit > 0;
ArrayList<String> list = new ArrayList<>();
while ((next = indexOf(ch, off)) != -1) {
if (!limited || list.size() < limit - 1) {
list.add(substring(off, next));
off = next + 1;
} else { // last one
//assert (list.size() == limit - 1);
list.add(substring(off, value.length));
off = value.length;
break;
}
}
// If no match was found, return this
if (off == 0)
return new String[]{this};
// Add remaining segment
if (!limited || list.size() < limit)
list.add(substring(off, value.length));
// Construct result
int resultSize = list.size();
if (limit == 0)
while (resultSize > 0 && list.get(resultSize - 1).length() == 0)
resultSize--;
String[] result = new String[resultSize];
return list.subList(0, resultSize).toArray(result);
}
return Pattern.compile(regex).split(this, limit);
}
Following the logic of the replace method:
public String replaceAll(String regex, String replacement) {
return Pattern.compile(regex).matcher(this).replaceAll(replacement);
}
The split implementation should be:
public String[] split(String regex, int limit) {
return Pattern.compile(regex).split(this, limit);
}
The performance losses are not far from the ones found on the replace methods. For some reason Oracle gives a fastpath approach on some methods and not others.