MessageDigest.digest() returning same hash for different strings having Norwegian character

Question

I am calling MessageDigest.digest() method to get the hash of the password. If the password contains a Norwegian character e.g. 'ø', this method returns same hash for different strings with different last character. "Høstname1" and "Høstname2" have same hash but "Hostnøme1" will have a different hash as 'ø' location is different. This is with "utf-8" encoding. For "iso-8859-1" encoding, I am not seeing this issue. Is this a known problem or am I missing something here?

This is my code:

    import java.security.MessageDigest;

    String password = "Høstname1";
    String salt = "6";

    MessageDigest messageDigest = MessageDigest.getInstance("SHA-256");
    byte[] hash = new byte[40];
    messageDigest.update(salt.getBytes("utf-8"), 0, salt.length());
    messageDigest.update(password.getBytes("utf-8"), 0, password.length());
    hash = messageDigest.digest();

Just a guess but it might have something to do with the encoding of your sourcefile and the encoding set for the compiler. — André Stannek, May 16 '17 at 14:17

wero · Accepted Answer · 2017-05-16T17:42:47.113

0

You shouldn't pass the length of the string to messageDigest.update

messageDigest.update(password.getBytes("utf-8"), 0, password.length());

but the length of the byte array since the utf-8 encoded string usually has more bytes than the number of characters in the string:

byte[] pwd = password.getBytes("utf-8");
messageDigest.update(pwd, 0, pwd.length);

or even shorter (thanks @Matt)

messageDigest.update(password.getBytes("utf-8"));

Same for salt.

Therefore your code was only hashing the beginning of the password.

edited May 16 '17 at 17:42

answered May 16 '17 at 14:20

wero

32,544
3
59
84

Thanks, it is working now. I am just wondering why was working earlier with all the English characters. – namang029 May 16 '17 at 14:58
@namang029 if all characters in the password are < 128 then `password.getBytes("utf-8").length == password.length()` so you didn't notice the bug – wero May 16 '17 at 15:21
@namang029 use the overload without the length so you don't have to worry about whether or not you're making this mistake – Matt Timmermans May 16 '17 at 16:22

MessageDigest.digest() returning same hash for different strings having Norwegian character

1 Answers1