4

I am trying to export some keys as byte[] from a HashSet to a HashMap and use the HashMap do store pairs of data. However, I am running into a problem, which is that the size om the set is larger than the size of the HashMap, for some reason. I would like to know what causes this as the HashMap is within a foreach which iterates from 0 to the size of the HashSet, namely 2^20. So, I am also expecting the size of the HashMap also to be 2^20.

So, what I am trying to store in the HashMap is two byte arrays. I am currently working with the meet-in-the-middle attack on 2DES. My encryption are properly implemented. In addition, my DES key generator is also properly implemented, so I am able to generate 2^20 keys (only 20 bits of the keys are effective). However, I when I try to put the keys in the HashMap the size is not the same as the HashSet, which doesn't make any sense.

for (int i = 0; i < Math.pow(2, 20); i++) {
    possibleKeySet.add(generateDesKey());
}

for (byte[] key : possibleKeySet) {
    intermediateCipher.put((encrypt(key, plainText)).toString(), key);
}

Output: Set size: 1048576 Map size: 1048295

PS: intermediateCipher is my HashMap.

Update: I have tried to implement hashcode and equals, but i'm not sure how to implement hashcode.

class ByteArray {

    private byte[] key;

    ByteArray(byte[] key) {
        this.key = key;
    }

    byte[] getKey() {
        return key;
    }

    public boolean equals(Object obj) {
      ByteArrayOutputStream bos = new ByteArrayOutputStream();
      ObjectOutputStream oos = new ObjectOutputStream(obj);
      oos.writeObject(obj);
      oos.flush();
      byte [] data = bos.toByteArray();
      return key.equals(data);
}

    public int hashCode() {
        // what should I write here?
    }

}
user8231110
  • 153
  • 9

3 Answers3

3

The probability of a hash collision is (as can be seen here):

graph

The average number of collisions for a set of size n given k inputs is:

N(n,k)~=k(k-1)/(2n)

Given an n of 2^32 and a k of 2^20, the average collision is

(2^20) * (2^20 - 1) / (2 * 2^32)
~= 2^40/2^33
~= 2^7
~= 128

The number of collisions you saw was 1048576 - 1048295 = 281. Given this info I assume that the actual entropy of the value returned by encrypt(key, plainText) is roughly ~31 bits (rather than 32).


To get the desired amount of keys, you may want to just keep generating values until you reach the desired size. This may make the method take a long time to complete:

while (intermediateCipher.size() < Math.pow(2,20)) {
    byte[] key = generateDesKey();
    intermediateCipher.put((encrypt(key, plainText)).toString(), key);
}

flakes
  • 21,558
  • 8
  • 41
  • 88
  • Very interesting, but how do I avoid the collision? – user8231110 Nov 22 '18 at 23:30
  • Even with hash collision `HashSet` and `HashMap` is using equals for checking collision (So first hashCode bucket, then a list with equals checking). In my answer, I provide a full solution to the problem. – ByeBye Nov 22 '18 at 23:32
  • @user8231110 I added a simple way to get the desired size! – flakes Nov 22 '18 at 23:35
  • but possibleKeySet.add(generateDesKey()); is a HashSet, so I cannot store it in a byte[] – user8231110 Nov 22 '18 at 23:42
  • @user8231110 Just call `intermediateCipher.values()` to get them. After the while loop you could even do: `possibleKeySet.addAll(intermediateCipher.values())` – flakes Nov 22 '18 at 23:44
  • but intermediateCipher does not have any values prior to adding the keys in the HashSet – user8231110 Nov 22 '18 at 23:49
  • 1
    "Very interesting, but how do I avoid the collision?" - You can't. You don't. You detect the collision ... and generate another key to replace it. @flakes - the collisions are in `possibleKeySet`! – Stephen C Nov 22 '18 at 23:56
  • @user8231110 why do you need them in the `HashSet` first? You can skip creating the `possibleKeySet` altogether. The `intermediateCipher` will contain both the set of hashes produced by `encrypt(key, plainText)` and the original `byte[]` key values. – flakes Nov 22 '18 at 23:58
  • So, I can just have a while loop, which iterates until the size of intermediateCipher is 2^20. Within the loop I call intermediateCipher.put((encrypt(generateDesKey(), plainText)).toString(), generateDesKey()); – user8231110 Nov 23 '18 at 00:03
  • @user8231110 that's close, but not quite. I assume `generateDesKey()` produces a different value every call. If you call it twice you will get two different keys. Take a look at the last section of my answer. – flakes Nov 23 '18 at 00:06
  • You are right, it generates two different keys. By looking at your last section: while (intermediateCipher.size() < Math.pow(2, 20)) { byte[] key = generateDesKey(); intermediateCipher.put((encrypt(key, plainText)).toString(), key); } – user8231110 Nov 23 '18 at 00:12
  • @user8231110 looks good to me. Next up you should write some unit tests to fully verify that it works! – flakes Nov 23 '18 at 00:20
1

If your Set and HashMap have a different size it will probably be a collision on keys. It looks like your function encrypt(key, plainText) return duplicates.

Try with:

for (byte[] key : possibleKeySet) {
    Object oldValue = intermediateCipher.put((encrypt(key, plainText)).toString(), key);
    if(oldValue != null) {
        System.out.println("Duplicated!");
    }
}

It will probably produce some errors.

Take a note also, that generateDesKey() can also produce the same values even if they are stored in Set as two arrays in Java are only equal if they are the same object.

What you can do? Create a custom object:

class ByteArray {

    private byte[] key;

    ByteArray(byte[] key) {
        this.key = key;
    }

    byte[] getKey() {
        return key;
    }

    public boolean equals(Object obj) {
        //implement your equals logic using array members equality
    }

    public int hashCode() {
        //implement your hashCode logic using array members equality
    }

}
ByeBye
  • 6,650
  • 5
  • 30
  • 63
  • Yes, you are right. I do have some duplicates. How do I then make sure that the keys in the Set are unique. Then using it makes no sense, if the values in the Set aren't unique – user8231110 Nov 22 '18 at 23:15
  • You can use a new custom object which will store your array as a field and then override hashCode and equals to check what is INSIDE array. – ByeBye Nov 22 '18 at 23:16
  • They are unique only by design of equals of an array. Not unique in by author algorithm design - the array is not fit here to be used as part of a set. – ByeBye Nov 22 '18 at 23:19
  • I think the symptoms mean that your `generateDesKey` is generating non-unique keys ... as well. – Stephen C Nov 22 '18 at 23:30
  • For generating a key, I'm using SecureRandom() – user8231110 Nov 22 '18 at 23:35
  • 1
    What is your question now? I provide a solution and answer why your set size is different than HashSet size. If you have another question - in the example with helping create better function it should be a new separate question. – ByeBye Nov 22 '18 at 23:37
  • I had tried to implement the equal and hashcode but I'm not sure how to implement the hashcode. I have given it a try and hope you can help me. – user8231110 Nov 24 '18 at 17:34
  • Use for that `Arrays.hashCode` and `Arrays.equals` – ByeBye Nov 25 '18 at 19:59
0

The set itself is being implemented that way that every item inside must be unique - it does not allow duplicates. That means that when you are putting to set two equal objects in result in set there will be only one.

Probably your generateDesKey() method is returning not unique values for all of 2^20 values and in the result set the items count is less than 2^20

Basically you can check the possibleKeySet size before copying values to the HashMap like

System.out.println(possibleKeySet.size());

or just by using debugger

m.antkowicz
  • 13,268
  • 18
  • 37