-1

when I use apache common-codec md5Hex to get the inputstream's md5 result,but get the different result for twice. the example code is below :

public static void main(String[] args) {
    String data = "D:\\test.jpg";
    File file = new File(data);
    InputStream is = null;
    try {
        is = new FileInputStream(file);
    } catch (FileNotFoundException e) {
        e.printStackTrace();
    }
    String digest = null, digest2 = null;
    try {
        System.out.println(is.hashCode());
        digest = DigestUtils.md5Hex(is);
        System.out.println(is.hashCode());

        digest2 = DigestUtils.md5Hex(is);
        System.out.println(is.hashCode());

    } catch (IOException e) {
        e.printStackTrace();
    }
    System.out.println("Digest = " + digest);
    System.out.println("Digest2 = " + digest2);
}

and the result is:

1888654590
1888654590
1888654590
Digest = 5cc6c20f0b3aa9b44fe952da20cc928e
Digest2 = d41d8cd98f00b204e9800998ecf8427e

Thank you for answer!

iameven
  • 338
  • 4
  • 15
  • I think (but I may be very wrong, hence this is a comment and not an answer) that you shouldn't try hashing a `Stream`. A stream is a data source, and you usually try to hash the data itself, not where it's coming from. So, I think you should either stream out all the data from the file and hash the result (a byte[] probably), of hash the `File` directly. As to why the hash is different, I can guess that it is because the stream changes its state while your read it, so the hash changes too. – francesco foresti Jul 03 '15 at 07:51
  • @francesco yes ! use the byte[] is better way,Thank you. – iameven Jul 03 '15 at 08:34

3 Answers3

3

The InputStream can be traversed only once. The first call traverses it and returns the MD5 for your input file. When you call md5hex the second time, the InputStream points to the end-of-file, thus the digest2 is the MD5 for empty input.

Tagir Valeev
  • 97,161
  • 19
  • 222
  • 334
3

d41d8cd98f00b204e9800998ecf8427e is the md5 hash of the empty string ("").

That is because is is a stream, meaning that once you've read it (in DigestUtils.md5Hex(is)), the "cursor" is at the end of the stream, where there is no more data to read, so attempting to read anything will return 0 bytes.

I suggest reading the contents of the stream to a byte[] instead, and hashing that.
For how to get a byte[] from an InputStream, see this question.

Community
  • 1
  • 1
Siguza
  • 21,155
  • 6
  • 52
  • 89
  • Thank you , too silly to not find out that the `""`‘s md5 value:(,now, I now the reason why this problem take – iameven Jul 03 '15 at 08:16
0

You cannot move back within InputStream. So invoking twice:

DigestUtils.md5Hex(is);

is not the same. Better read into byte array and use:

public static String md5Hex(byte[] data)
Artur
  • 7,038
  • 2
  • 25
  • 39