0

I'm using MessageDigest to make the hash of files using SHA 256 as follow:

    byte[] hash = new byte[32];
    MessageDigest digest = MessageDigest.getInstance("SHA-256");
    try (InputStream input = Files.newInputStream( Paths.get(file.getPath()) )) {
        byte[] buf = new byte[8192];
        int len;
        while ( (len=input.read(buf)) > 0 ) {
            digest.update(buf, 0, len);
        }
    } catch (IOException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    }
    hash = digest.digest();

The "simplified" idea is: i hash a file, take only the two first bytes, send it two a server; server looks in his DB ob he already have this "shorthash" (i mean, the two bytes). If yes, client isn't allowed to send the file, which will be save in DB with the shorthash.

Problem is: if i give two times the same file, it won't give me the same hash. And i have no idea why.

Ablia
  • 317
  • 1
  • 3
  • 14
  • What is `file`? Does it refer to the exact same file both times? –  Jul 30 '18 at 12:01
  • You actually generate a hash after an exception? And using a mere two bytes to try to uniquely identify files is misguided at best. Hashing only 200 files with 16 bits has a 26% chance of a collision... http://everydayinternetstuff.com/2015/04/hash-collision-probability-calculator/ – Andrew Henle Jul 30 '18 at 12:01
  • 2
    The code is correct and it gives the same hash for one file. If it does not the file content changes or you don't print the hash correctly (encode it base64 or hex before printing it). – Robert Jul 30 '18 at 12:02
  • @intentionallyleftblank : file is my input. This code is in fact a function "public byte[] hash (File file)" and return the byte array "hash" at the end. I've tried using this function two times with the same files, and it gave me different output. – Ablia Jul 30 '18 at 12:20
  • @AndrewHenle: you're right, this should be inside the try/catch. And i know this might look stupid, but i didn't explain the whole process here. Don't worry, there's several steps between the shorthash comparison and the actual decision. (including the comparison of the hash itself) – Ablia Jul 30 '18 at 12:20
  • @Robert : I give it exactly the same file, so i guess the printing can be the problem. I'm just printing the byte array: System.out.println(hash). But then, i saw the problem because my DB wouldn't found the shorthash, which mean even the DB think they're not the same hash. Here's my SQL statement for that: "SELECT owner,filename FROM files WHERE shorthash LIKE '" + shorthash +"'" – Ablia Jul 30 '18 at 12:21

1 Answers1

1

Thanks to Robert, it seems it was just a printing problem. I was printing it this way and get two weird String beginning with B@:

System.out.println(hash);

By doing it this way i get two int array which are exactly the same:

System.out.println(Arrays.toString(hash))

Now i just have to find out why my DataBase doesn't see they're the same. Since this is probably due to SQL statement, this is no more the subject.

Ablia
  • 317
  • 1
  • 3
  • 14