6

I was looking at the source for java.lang.String and noticed the equals method doesn't check whether the char[] backing each String is the same object. Wouldn't this improve compare times?

Supposed improvement contained in this rewritten version:

public boolean equals(Object anObject) {
       if (this == anObject) {
           return true;
       }
       if (anObject instanceof String) {
           String anotherString = (String)anObject;
           int n = count;
           if (n == anotherString.count) {
               char v1[] = value;
               char v2[] = anotherString.value;
               int i = offset;
               int j = anotherString.offset;
               /** Begin Optimization **/
               if(v1==v2 && i==j){
                   return true;
               }
               /** End Optimization **/
               while (n-- != 0) {
                   if (v1[i++] != v2[j++])
                       return false;
               }
               return true;
           }
       }
       return false;
   }

I believe this would improve performance in the case that the two Strings were obtained using String.substring, and possibly even interned Strings.

Does anybody know if there's a reason they chose not to implement it this way?

Update: For anybody who might not know a lot about the implementation of String, there are cases other than the String pool where two String objects can have the same char[] value, int offset, and int count.

Consider the following code:

String x = "I am a String, yo!";
String y = x.split(" ")[3];
String z = x.substring(7,14);

You would end up with a situation like this: Debugger Expressions

Also apparently the value-sharing feature of Strings has been done away with in Java 7u6 in order to satisfy some benchmarks. So if you spent time making your code run in decent time (or at all) by using String.substring() rather than String concatenation, you're SOL.

UFL1138
  • 632
  • 3
  • 10
  • The way to do that is to compare each character. – Sotirios Delimanolis Sep 18 '13 at 20:01
  • may be it hasnt have sense cause always work with `clone()` arrays – nachokk Sep 18 '13 at 20:03
  • 4
    It might have improved performance for very rare cases pre-Java 7 -- e.g. `foo.substring(i, j).equals(foo.substring(i, j))` -- but it would also incur an extra check for the much more common case when the string arrays are _not_ equal, which seems likely to cost more time on average than it saves. See e.g. [this blog post](http://smallwig.blogspot.com/2010/03/little-optimization-that-couldnt.html). – Louis Wasserman Sep 18 '13 at 20:13
  • @SotiriosDelimanolis I'm not asking whether they are .equal(); I'm asking if they're the same object. – UFL1138 Sep 18 '13 at 22:11
  • @LouisWasserman I would say the most important difference between this optimization and his example is that this would prevent an O(n) operation by running one more O(1) line of code (1 or 2 comparisons, depending on how lucky we are); while his example is only preventing an O(n) operation where n=0 or n=1 (essentially). – UFL1138 Sep 19 '13 at 00:46
  • Even so. `n` is not often very large, and this case is massively less common than the usual case. – Louis Wasserman Sep 19 '13 at 03:34

4 Answers4

1

Well, you'd need to check the char[], the offset and the count (string length). Since the char[] is only created from within the String class, the only way for all three of those to be equal would be for a String to create a doppelgänger from itself. You can get it to do that (e.g. new String("why?")), but it's not a common use case.

<speculative> I'm not even sure if it would speed anything up. The vast majority of the time, the check will fail, meaning that it's doing extra work for no benefit. That could be offset by branch prediction, but in that case, the few times the check passes, it'll invalidate guesses made by that branch prediction, which could actually slow things down. In other words, if the JVM/CPU tries to optimize for the common case, you'll usually gain nothing, and you'll actually hurt yourself in the rare case (which is what you're trying to optimize). If it doesn't try to optimize that common case, you hurt yourself in most comparisons for the sake of a fairly rare set of comparisons. </speculative>

yshavit
  • 42,327
  • 7
  • 87
  • 124
  • count is already being checked as the condition of the if() block surrounding my code. And if some strings are built from substrings of the same string, they would have the same char[] and could have the same count and offset. – UFL1138 Sep 18 '13 at 22:13
  • Ah true on the count issue, my eyes just skimmed the code. And I mentioned that it's possible to have the same `char[]`, `count` and `offset` -- that's just not the common case. I read @LouisWasserman's link after I posted this answer, but it's a good post on the issues of this kind of "optimization." – yshavit Sep 18 '13 at 22:51
  • I imagine it's also not the common case for anObject to be the same as the String being compared, or for the Object being compared not to be a String -- but they are both checked in the equals() method anyway. It's a tradeoff between one more (and sometimes two more) comparisons in constant O(1) before kicking off a series of comparisons that occur in linear O(n) time; and all 4 variables are already loaded. – UFL1138 Sep 19 '13 at 00:03
  • 1
    @UFL1138: The instanceof check is no optimization, it's needed for correctness. I guess, the fast `==` check has better chances then yours and it saves all the work... – maaartinus Sep 25 '13 at 23:22
0

In Java 7 (see this article), substring() no longer uses the same backing array for the returned String. you would still need to check each character. Basically, String backing char[] are never shared, so you can't

this.value == other.value
Sotirios Delimanolis
  • 274,122
  • 60
  • 696
  • 724
  • So, you mean, in Java 7, it creates a copy of the background array instead of using the same backing array? – gparyani Sep 18 '13 at 20:07
  • You could, however, replace that code with `java.util.Arrays.equals(value, other.value)`. – gparyani Sep 18 '13 at 20:09
  • @gparyani Yes, the constructor that's called uses `Arrays.copyOfRange` – Sotirios Delimanolis Sep 18 '13 at 20:09
  • @gparyani The equals method are pretty much equivalent. – Sotirios Delimanolis Sep 18 '13 at 20:10
  • I'm a bit skeptical; that article you linked to cites no sources and looking at [OpenJDK's implementation](http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/7-b147/java/lang/String.java#String.substring%28int%29), it certainly continues to behave as always (shared char[]). – UFL1138 Sep 18 '13 at 22:48
  • @UFL1138 The Oracle JDK no longer shares backing array. – Sotirios Delimanolis Sep 18 '13 at 22:49
  • @UFL1138 Look at this [bug report](http://bugs.sun.com/view_bug.do?bug_id=4513622) and the [question](http://stackoverflow.com/questions/16123446/java-7-string-substring-complexity) in which I found the link. – Sotirios Delimanolis Sep 18 '13 at 22:49
  • In Java8, `StringBuffer.toString` use constructor `String(char[] value, boolean share)`, then `this.value == other.value` maybe to `true`. – xmcx Dec 16 '22 at 09:12
0

I don't understand this question.
The char[] is an internal member of String. If 2 String references are the same (should be since you should be using intern strings) the char[] would be the same.
But for different instances why would you expect the char[] to be the same reference? Strings are immutable and it is not possible for 2 different String objects to share a reference to the same backed array.
Additionally it does not even make sense to use this conditional check even for substring.
I was not aware of the change in Java 7 mentioned in one of the answers but it would be wrong to check for equality of the backed array in this case.
A String object is not only the backing array but its current offset, length etc.
So 2 String objects as a result of substring may be backed by the same char array but can very well contain different (sub)-strings as a content - different offsets in the same char array

Cratylus
  • 52,998
  • 69
  • 209
  • 339
  • You should take a look at [the source code for String](http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/lang/String.java). Consider the following: String x = "I am a String, guy!"; String y = x.substring(5,7); String z = x.substring(5,7); y and x will have the same offset and count as each other, as well as share the same char[] as x. – UFL1138 Sep 18 '13 at 23:59
  • @UFL1138:If you do `x.substring(5,7)` and `y.substring(0,3)` you still have the same buffer but different strings – Cratylus Sep 22 '13 at 18:43
0

Doing such a check on the backing character array will most likely be redundant and is not required.

There are two instances where backing character array object could be identical objects(as other pointed substring method always creates a new backing character array).

Defining a string literal

String a = "Hello";
a.equals("Hello"); // Backing array of "Hello" string literal 
                   // will be same as that of variable a

In this case equals method will determine that the String are equal at the following line even before checking the backing char array.

if (this == anObject) { // From String.equals method
    return true;
}

Using String copy constructor to create another String object

Note that the following code block has no practical value and could never be done in a real code.

String a = "Hello;
String b = new String(a);
a.equals(b);

So rather than doing an extra check to determine if character arrays are same, it is safe to assume they will always be different if the String objects are different.

Arun
  • 321
  • 1
  • 5