3

I usually do e.g.

HashMap<String,String> dictionary = new HashMap<String,String>();

I started to think about it, and as far as I know a HashMap is implemented under the hood via a hash table.
The objects are stored in the table using a hash to find where they should be stored in the table.

Does the fact that I do not set a size on the construction of the dictionary makes the performace decrease?
I.e. what would be the size of the hash table during construction? Would it need to allocate new memory for the table as elements increase?
Or I am confused on the concept here?
Are the default capacity and load adequate or should I be spending time for the actual numbers?

user207421
  • 305,947
  • 44
  • 307
  • 483
Cratylus
  • 52,998
  • 69
  • 209
  • 339
  • It's not based on a `java.util.Hashtable` (which is synchronized), but it is based on a hash table. – Mat Sep 25 '11 at 09:37
  • @EJP: don't believe everything written read, but [read the source](http://www.docjar.com/html/api/java/util/HashMap.java.html). Java's `HashMap` is a very straightforward chaining hash table implementation with not a tree in sight. – Fred Foo Sep 25 '11 at 10:41
  • 1
    @EJP, the structure of chained buckets is a tree, not a binary, b-tree or spline but still a tree. As for the gripes: you have allocation on put, worse locality (indirection, i.e. cache-miss) on get and extra indirection on collision. for instance (just fast google) *A study by Zukowski et al.[4] has shown that cuckoo hashing is much faster than chained hashing for small, cache-resident hash tables on modern processors.* http://en.wikipedia.org/wiki/Cuckoo_hashing#Generalizations_and_applications Impl. of IdentityHashMap is what the general HT in java should be. low mem-footprint and fast. – bestsss Sep 25 '11 at 13:59
  • 1
    @bestsss: calling a list a tree is true in a pathological sense and very confusing. The same goes for calling a list of lists a tree. OTOH, I can imagine that `HashMap` isn't extremely fast or memory-efficient since it does do a lot of allocation. – Fred Foo Sep 25 '11 at 15:18
  • @larsmans it's not a list, it's an array of linked lists. This is the 1st name (tree-based) I recall many years ago I read about hash tables. Knuth calls it chained, indeed. On nowadays (for 10+ years) hardware linear probe power-of-2 is likely to yeild the best performance and memory footprint. (probably rivaled by prime length table but then it takes mod (~30clocks) instead of right-shift (1 clock) to get the index) – bestsss Sep 25 '11 at 16:45
  • @bestsss it is a structure of buckets or as you correctly say an array of linked lists. One level of parent nodes. No tree structure. Nobody calls it a tree. You are just confusing the issue. And the existence of other techniques and papers doesn't constitute 'gripes' that 'you can read'. – user207421 Sep 25 '11 at 22:45

5 Answers5

5

Does the fact that I do not set a size on the construction of the dictionary makes the performace decrease?

Depends on how much you're going to store in the HashMap and how your code will use it afterward. If you can give it a ballpark figure up front, it might be faster, but: "it's very important not to set the initial capacity too high [...] if iteration performance is important" 1 because iteration time is proportional to the capacity.

Doing this in non-performance-critical pieces of code would be considered premature optimization. If you're going to outsmart the JDK authors, make sure you have measurements that show that your optimization matters.

what would be the size of the hash table during construction?

According to the API docs, 16.

Would it need to allocate new memory for the table as elements increase?

Yes. Every time it's fuller than the load factor (default = .75), it reallocates.

Are the default capacity and load adequate

Only you can tell. Profile your program to see whether it's spending too much time in HashMap.put. If it's not, don't bother.

Fred Foo
  • 355,277
  • 75
  • 744
  • 836
5

The nice thing about Java is that it is open-source, so you can pull up the source code, which answers a number of questions:

  1. No, there is no relationship between HashMap and HashTable. HashMap derives from AbstractMap, and does not internally use a HashTable for managing data.

  2. Whether or not omitting an explicit size will decrease performance will depend upon your usage model (or more specifically, how many things you put into the map). The map will automatically double in size every time a certain threshold is hit (0.75 * <current map capacity>), and the doubling operation is expensive. So if you know approximately how many elements will be going into the map, you can specify a size and prevent it from ever needing to allocate additional space.

  3. The default capacity of the map, if none is specified using the constructor, is 16. So it will double its capacity to 32 when the 12th element is added to the map. And then again on the 24th, and so on.

  4. Yes, it needs to allocate new memory when the capacity increases. And it's a fairly costly operation (see the resize() and transfer() functions).

Unrelated to your question but still worth noting, I would recommend declaring/instantiating your map like:

Map<String,String> dictionary = new HashMap<String,String>();

...and of course, if you happen to know how many elements will be placed in the map, you should specify that as well.

aroth
  • 54,026
  • 20
  • 135
  • 176
  • The java 7 style of not having to specify the generic parameters on the right side of the expression is even better. – Scorpion Sep 25 '11 at 11:57
  • *.and of course, if you happen to know how many elements will be placed in the map, you should specify that as well.* divided by the loadfactor and rounded to the next pow-2. If you specify new HashMap(7) and put 7 elements you'd get firstly initied to 8 and then grown (rehashed) to 16. – bestsss Sep 25 '11 at 18:07
1

Hashmap would automatically increase the size if it needs to. The best way to initialize is if you have some sort of anticipating how much elements you might needs and if the figure is large just set it to a number which would not require constant resizing. Furthermore if you read the JavaDoc for Hashmap you would see that the default size is 16 and load factor is 0.75 which means that once the hashmap is 75% full it will automatically resize. So if you expect to hold 1million elements it is natural you want a larger size than the default one

LordDoskias
  • 3,121
  • 3
  • 30
  • 44
1

I would declare it as interface Map first of all.

Map<String,String> dictionary = new HashMap<String,String>();

Does the fact that I do not set a size on the construction of the dictionary makes the performace decrease?

Yes, initial capacity should be set for better performance.

Would it need to allocate new memory for the table as elements increase

Yes, load factor also effects performance.

More detail in docs

NimChimpsky
  • 46,453
  • 60
  • 198
  • 311
0

As stated here, the default initial capacity is 16 and the default load factor is 0.75. You can change either one with different c'tors, and this depends on your usage (though these are generally good for general purposes).

Eran Zimmerman Gonen
  • 4,375
  • 1
  • 19
  • 31