2

This may be a strange question, but it is based on some results I get, using Java Map - is element retrieval speed greater in case of a HashMap, when the map is smaller?

I have some part of code that uses containsKey and get(key) methods of a HashMap, and it seems that runs faster if number of elements in the Map is smaller? Is that so?

My knowledge is that HashMap uses some hash function to access to certain field of a map, and there are versions in which that field is a reference to a linked list (because some keys can map to same value), or to other fields in the map, when implemented fully statically.

Is this correct - speed can be greater if Map has less elements?

I need to extend my question, with a concrete example.

I have 2 cases, in both the total number of elements is same.

  • In first case, I have 10 HashMaps, I'm not aware how elements are distributed. Time of execution of that part of algorithm is 141ms.
  • In second case, I have 25 HashMaps, same total number of elements. Time of execution of same algorithm is 69ms.

In both cases, I have a for loop that goes through each of the HashMaps, tries to find same elements, and to get elements if present.

Can it be that the execution time is smaller, because individual search inside HashMap is smaller, so is there sum?

I know that this is very strange, but is something like this somehow possible, or am I doing something wrong?

Map(Integer,Double) is considered. It is hard to tell what is the distribution of elements, since it is actually an implementation of KMeans clustering algorithm, and the elements are representations of cluster centroids. That means that they will mostly depend on the initialization of the algorithm. And the total number of elements will not mostly be the same, but I have tried to simplify the problem, sorry if that was misleading.

Kobe-Wan Kenobi
  • 3,694
  • 2
  • 40
  • 67
  • I know what you are talking about but it would be better if you back it with some results – Kumar Abhinav Sep 24 '14 at 21:09
  • I will add some concrete details. – Kobe-Wan Kenobi Sep 24 '14 at 21:20
  • 5
    Any data structure which is smaller will fit into your CPU caches better and be faster. This is not so much a feature of your data structure, rather how CPUs work. – Peter Lawrey Sep 24 '14 at 21:27
  • Thanks, this is a great observation, very helpful, generally. You have +1 from me. Can you please consider my concrete problem? – Kobe-Wan Kenobi Sep 24 '14 at 21:29
  • Please provide information about what is the type of elements you are using, is hashcode/equals overridden, how many elements there are in total. If you are not aware how elements are distributed, can it happen that in first case all elements are placed in the map that is checked last and in the second case they all are in the first? – Aivean Sep 24 '14 at 22:04
  • [There is another thread](http://stackoverflow.com/questions/7115445/what-is-the-optimal-capacity-and-load-factor-for-a-fixed-size-hashmap), were someone really put some effort into finding an optimal balance between map size, capacity and load factor. Also, he wrote a tltr summary, just skip to the end. – SME_Dev Sep 24 '14 at 22:18

2 Answers2

1

The number of collisions is decisive for a slow down.

Assume an array of some size, the hash code modulo the size then points to an index where the object is put. Two objects with the same index collide.

Having a large capacity (array size) with respect to number of elements helps.

With HashMap there are overloaded constructors with extra settings.

public HashMap(int initialCapacity,
               float loadFactor)

Constructs an empty HashMap with the specified initial capacity and load factor.

You might experiment with that.

For a specific key class used with a HashMap, having a good hashCode can help too. Hash codes are a separate mathematical field.

Of course using less memory helps on the processor / physical memory level, but I doubt an influence in this case.

Joop Eggen
  • 107,315
  • 7
  • 83
  • 138
0

Does your timing take into account only the cost of get / containsKey, or are you also performing puts in the timed code section? If so, and if you're using the default constructor (initial capacity 16, load factor 0.75) then the larger hash tables are going to need to resize themselves more often than will the smaller hash tables. Like Joop Eggen says in his answer, try playing around with the initial capacity in the constructor, e.g. if you know that you have N elements then set the initial capacity to N / number_of_hash_tables or something along those lines - this ought to result in the smaller and larger hash tables having sufficient capacity that they won't need to be resized

Zim-Zam O'Pootertoot
  • 17,888
  • 4
  • 41
  • 69