-3

I am creating MD5 digest in Java which is needed to calculate 4-byte hex hash of the input string. Following is the code in Java:

public static String hashString(String s) {
        MessageDigest md;
        try {
            md = MessageDigest.getInstance("MD5");
            byte[] digest = md.digest(s.getBytes("US-ASCII"));
            byte[] output = new byte[digest.length / 4];
            for (int i = 0; i < output.length; i++) {
                for (int j = 0; j < digest.length; j += 4) {
                    System.out.print(digest[j]);
                    output[i] ^= digest[i + j];
                }
            }
            return getHexString(output);
        } catch (NoSuchAlgorithmException | UnsupportedEncodingException e) {
           return null;
        }
    }

I wanted to use the same code in Golang, however, the MD5 output is different than what I am getting in Java. Below is the code in Go:

func hashString(s string) string {
    md := md5.New()
    md.Write([]byte(s))
    data := md.Sum(nil)
    fmt.Println(data)
    output := make([]byte, len(data)/4)
    for i := 0; i < len(output); i++ {
        for j:=0 ;j < len(data); j++ {
            output[i] ^= data[i + j]
            fmt.Print(output[i])
        }
    }
    return getHexString(output)
}

I have added print statements in both code samples. As I am new in Go, I am not aware if there are any other libraries or way available to do so. I just followed what I found on internet. It would be really great if someone can help with this.

icza
  • 389,944
  • 63
  • 907
  • 827
Yoda
  • 323
  • 6
  • 14
  • You are printing `output[i]` instead of `digest[j]` in the `go` code. Isn't that what makes the difference in the logs you see? – Ullaakut Aug 14 '18 at 11:37
  • 2
    You should make clear what input you use and what the output is. – Volker Aug 14 '18 at 11:38

1 Answers1

5

1. Different loops

Your inner loops are different.

In Java:

for (int j = 0; j < digest.length; j += 4) {
    System.out.print(digest[j]);
    output[i] ^= digest[i + j];
}

In Go:

for j:=0; j < len(data); j++ {
    output[i] ^= data[i + j]
    fmt.Print(output[i])
}

Notice in Java you increment the loop variable by 4, and in Go only by 1. Use:

for j:=0; j < len(data); j += 4 {
    output[i] ^= data[i + j]
    fmt.Print(output[i])
}

2. Different returned data

UPDATE: Asker clarified that this was just a typo in the posted code and since then was removed (edited out).

Also your Java solution returns the hex representation of the output:

return getHexString(output);

But in Go you return the hex representation of the (full) MD5 digest:

return getHexString(md.Sum(nil))

So in Go also do:

return getHexString(output)

3. Input string => byte sequence conversion

One last note. In Java you convert your input string to a sequence of bytes using US-ASCII encoding, and in Go you are using the UTF-8 encoded sequence of your input string, because this is how Go stores strings naturally (so you get the UTF-8 byte sequence when you do []byte("some text")).

This will result in the same input data for texts using only characters of the ASCII table (whose code is less than 128), but will result in different data for texts that contain characters above that (as they will translate into multi-byte sequences in UTF-8). Something to keep in mind!


Simplification

Also note that to calculate the MD5 digest of some data, you may simply use the md5.Sum() function since you're throwing away the created hash.Hash anyway:

func hashString(s string) string {
    dataArr := md5.Sum([]byte(s))
    data := dataArr[:]
    fmt.Println(data)
    output := make([]byte, len(data)/4)
    for i := 0; i < len(output); i += 4 {
        for j := 0; j < len(data); j++ {
            output[i] ^= data[i+j]
            fmt.Print(output[i])
        }
    }
    return getHexString(output)
}

Byte representation differences

You said the content of the result arrays are different. This is due to the fact that the byte type in Java is signed, it has a range of -128..127, while in Go byte is an alias of uint8 and has a range of 0..255. So if you want to compare the results, you have to shift negative Java values by 256 (add 256).

If you convert the byte arrays (or slices) to a hex representation, it will be the same (hex representation has no "signness" property).

You can read more about this here: Java vs. Golang for HOTP (rfc-4226)

icza
  • 389,944
  • 63
  • 907
  • 827
  • Ok for hex representation, I mistakenly added md.Sum(nil). However, in actual it is returning the hex representation of output only. Let me try the first correction though. However, the bytes in array digest in java and array data in go are different. – Yoda Aug 14 '18 at 11:48
  • 1
    @Yoda See edited answer. `byte` in Java is signed and has a valid range of `-128..127`, while `byte` in Go is unsigned and has a valid range of `0..255`. Read more about it in [this answer](https://stackoverflow.com/questions/47797100/java-vs-golang-for-hotp-rfc-4226/47797161#47797161). But the hex representation is the same. – icza Aug 14 '18 at 12:14
  • corrected the error in inner loop which fixed it. Thanks for your reply and suggestions which I will help me later. – Yoda Aug 15 '18 at 03:15