10

I came across the below code snippet in a product's code. It is using bitwise XOR for string comparison. Is this better than the String.equals(Object o) method? What is the author trying to achieve here?

private static boolean compareSecure(String a, String b)
  {
    if ((a == null) || (b == null)) {
      return (a == null) && (b == null);
    }
    int len = a.length();
    if (len != b.length()) {
      return false;
    }
    if (len == 0) {
      return true;
    }
    int bits = 0;
    for (int i = 0; i < len; i++) {
      bits |= a.charAt(i) ^ b.charAt(i);
    }
    return bits == 0;
  }

For context, the strings being equated are authentication tokens.

Abhishek
  • 1,130
  • 1
  • 12
  • 25
  • 4
    It is a slower way of comparing strings, since it has to always compare all characters. It is no more secure than `String.equals()`. – Andreas Nov 14 '17 at 22:51
  • Also it will throw an exception if a != null and b==null. – FredK Nov 14 '17 at 22:53
  • 2
    @FredK Not true. Re-read the first `if` statement. That might actually be the only improvement over `String.equals()`: It is null-safe, and maybe that's the "secure" part. In Java 7+ you can get the same null-safe behavior using [`Objects.equals()`](https://docs.oracle.com/javase/7/docs/api/java/util/Objects.html#equals(java.lang.Object,%20java.lang.Object)) – Andreas Nov 14 '17 at 22:54
  • 1
    This is not secure than standard `equals`. – Joe C Nov 14 '17 at 22:55
  • 2
    @Zong No collision issue. It is using `bits |=`, not `bits ^=`. If any two characters are not equal, one or more bits will be set, and will stay set until the end. – Andreas Nov 14 '17 at 22:57
  • @Andreas Thanks for pointing that out, my bad :) – Zong Nov 14 '17 at 22:58
  • 6
    This comparison is intended to not be affected by the contents of the strings as much, to prevent timing attacks (normal string comparison leaks information about how long the equal-prefix is). – harold Nov 14 '17 at 23:16
  • @harold I wouldn't expect it to short-circuit on unequal lengths and null if that was the point. – shmosel Nov 14 '17 at 23:22
  • 1
    thanks @harold. your comment led me to run a search and came across this post https://security.stackexchange.com/questions/83660/simple-string-comparisons-not-secure-against-timing-attacks . Looks like yours is the best answer. – Abhishek Nov 14 '17 at 23:26
  • I wonder if the JIT compiler could optimize it into a short-circuiting loop anyway. – shmosel Nov 14 '17 at 23:29
  • |= (equal or) If set once, while iterating through charAt(i), with a num greater than 0, it will stay, and the bits == 0 will evaluate to false. – z atef Feb 16 '21 at 06:01

1 Answers1

13

This is a common implementation of string comparison function that is invulnerable to timing attacks.

In short, the idea is to compare all the characters every time you compare strings, even if you find any of them are not equal. In "standard" implementation you just break on the first difference and return false.

This is not secure because it gives away the information about the compared strings. Specifically if the left-side string is a secret you want to keep (e.g. password), and the right-side string is something provided by the user, an unsafe method allows the hacker to uncover your password with a relative ease, by repeatedly trying out different strings and measuring the response time. The more characters in the two strings are identical, the more the 'unsecure' function would take to compare them.

For instance, comparing "1234567890" and "0987654321" using a standard method would result in doing just a single comparison of the first character and returning false, since 1!=0. On the other hand comparing "1234567890" to "1098765432", would result in executing 2 comparison operations, because the first ones are equal, you have to compare the second ones to find they are different. This would take a bit more time and it is measurable, even when we are talking about remote calls.

If you do N attacks with N different strings, each starting with a different character, you should see one of the of the results taking a fraction of a milisecond more then the rest. This means the first character is the same, so the function has to take more time to compare the second one. Rinse and repeat for each position in the string and you can crack the secret orders of magnitude faster then brute force.

Preventing such attack is the point of such implementation.

Edit: As diligently pointed out in comment by Mark Rotteveel, this implementation is still vulnerable to timing attack that is aimed at revealing the length of the string. Still this is not a problem in many cases (either you don't care about attacker knowing the length or you deal with data that is standard and anyone can know the length anyway, for instance some kind of known-length hash)

K. Kirsz
  • 1,384
  • 10
  • 11