6

I have the following code on my main method, and when I iterate through the Set and print the values, the values are already sorted. What's the reason?

Set<Integer> set = new HashSet<Integer>();
set.add(2);
set.add(7);
set.add(3);
set.add(9);
set.add(6);

for(int i : set) {
    System.out.println(i);
}

Output:

2
3
6
7
9
Eran
  • 387,369
  • 54
  • 702
  • 768
Sddf
  • 83
  • 7
  • You could try to add the numbers in a different order and see if the output stays the same. But as the answers say, there is no guarantee about the ordering, and so it's just a coincidence :) – MinecraftShamrock Jan 13 '15 at 20:00
  • Contrary to the current lot of answers, it's not really an accident or a coincidence; but you'll find it won't generally happen if you put some bigger numbers into the set, or a mixture of negative and positive numbers. – Dawood ibn Kareem Jan 13 '15 at 20:20

3 Answers3

4

That's just coincidence. A HashSet does not preserve or guarantee any ordering.

It makes no guarantees as to the iteration order of the set; in particular, it does not guarantee that the order will remain constant over time.

Sotirios Delimanolis
  • 274,122
  • 60
  • 696
  • 724
  • 3
    It's not a coincident. It's a matter of `hashCode` implementation in `Integer`. – NiematojakTomasz Jan 13 '15 at 19:51
  • 1
    @NiematojakTomasz `HashSet` performs additional hashing to distribute the `hashCode` across the limited amount of buckets it maintains. That implementation is abstracted away, that's why I'm saying it's coincidence. – Sotirios Delimanolis Jan 13 '15 at 19:54
  • 1
    @NiematojakTomasz: It is a coincidence. The hash smearing done in `HashSet`'s internals, for example, would (usually) ruin the ordering of integers. – Louis Wasserman Jan 13 '15 at 19:54
  • Thanks guys. So, if I have a set of a class that I've created, when I iterate through the set. Will the items be sorted by the hashCode? – Sddf Jan 13 '15 at 19:57
  • 2
    @LouisWasserman, @SotiriosDelimanolis I agree that this shouldn't be something you should depend on as it is implementation detail. And it won't work for any set of integers. But the effect is a result of loadFactor, effective underlying array capacity and that all elements are positive integers smaller than capacity. Not clearly coincidental. I noticed that `HashMap` is using such trick `(h = key.hashCode()) ^ (h >>> 16)`, but it doesn't affect small values. – NiematojakTomasz Jan 13 '15 at 19:59
  • 1
    @MariaGabriela: Usually not. You shouldn't depend on this behavior, especially given that it's allowed to change between Java releases. – Louis Wasserman Jan 13 '15 at 20:06
4

I'm not sure calling it a coincidence is the right answer. There is no chance involved. It results of the hash functions being used, the small values you put in the HashSet and the small amount of elements you put in the Set.

  • For Integer, hashCode() is the int value of the Integer.

  • HashMap (and HashSet) do an additional hashing on the value returned by hashCode, but this additional hashing doesn't change the value for such small numbers as you added to the HashSet.

  • Finally, the bucket that each integer is put into is the modified hash code modulu the capacity of the HashSet. The initial capacity of a HashSet/HashMap is 16.

  • Therefore 2 is added to bucket 2, 7 is added to bucket 7, etc...

  • When you iterate over the elements of the HashSet, the buckets are visited in order, and since each bucket has at most a single element, you get the numbers sorted.

Here is how the bucket is computed :

int hash = hash(key.hashCode());
int i = indexFor(hash, table.length);

static int hash(int h) { // for the small integers you put in the set, all the values being
                         // xored with h are 0, so hash(h) returns h
    h ^= (h >>> 20) ^ (h >>> 12);
    return h ^ (h >>> 7) ^ (h >>> 4);
}

static int indexFor(int h, int length) {
     return h & (length-1); // table.length is the initial capacity - 16,
                            // so for h <= 15, indexFor(h,table.length)=h
}

Therefore, the buckets of 2,7,3,9,6 are 2,7,3,9,6 respectively.

The enhanced for loop you use to iterate over the HashSet visits the buckets in order, and for each bucket iterates over its entries (which are stored in a linked list). Therefore, for your input, 2 is visited first, followed by 3, 6, 7 and 9.

If you add numbers higher than 15, both the hash method and the indexFor method (assuming you didn't change the default capacity of the HashSet) would prevent the numbers from being sorted when iterated by the HashSet iterator.

Eran
  • 387,369
  • 54
  • 702
  • 768
  • Good information, +1. You've given code from the JDK 6 hashing algorithm. Interestingly, even though the bucket calculation changes from JDK 6 to 7 to 8, the buckets for integers 0-15 remain the same. However, the bucket calculated will differ in Java 8 from that of Java 6 and 7 for integers 16 and up. – rgettman Jan 13 '15 at 20:25
  • @rgettman I just Googled the code and got the JDK 6 algorithm. – Eran Jan 13 '15 at 20:28
2

This is just an accident. I tried :

final Set<Integer> set = new HashSet<Integer>();
set.add(2);
set.add(17);
set.add(32);
set.add(92);
set.add(63);

and I got 17 32 2 92 63. It was not in the sorted order as HashSet does not preserve the sorted order or the order in which they are added.

fastcodejava
  • 39,895
  • 28
  • 133
  • 186
  • 1
    @MariaGabriela If you want the numbers to be sorted all you have to do is use a `TreeSet` instead. – Paul Boddington Jan 13 '15 at 20:10
  • 1
    @fastcodejava What is the time complexity using TreeSet to sort the numbers? – Sddf Jan 13 '15 at 20:24
  • 1
    @MariaGabriela Each `add`, `remove` or `contains` `O(log(n))`, iteration over whole set `O(n)`. http://stackoverflow.com/questions/2759256/what-is-the-time-complexity-of-treeset-iteration http://docs.oracle.com/javase/7/docs/api/java/util/TreeSet.html – NiematojakTomasz Jan 13 '15 at 21:12