11

Loading 1 000 000 numbers takes 2 seconds to load into a treemap (binary search tree), but takes milliseconds to load into a hashmap (in java).
The only difference between the two is that I can see is I can set a hashmap's initial size so it does not constantly need to re-size.

Am I wrong to assume a TreeMap's array's initial size should be able to be set? Is there a different reason that it is so slow?
Is there a logical reason for why one cannot set TreeMap's, or any generic binary search tree's, size or is this wrong?

user2316667
  • 5,444
  • 13
  • 49
  • 71
  • 1
    That is not the only difference. Insertions into the treemap take O(log n) while the hashmap takes O(1). – Zong Aug 26 '13 at 00:22
  • It doesn't. TreeMap and HashMap will use slightly different structure to store its internal data. Each isn't into TreeMap needs to try and resolve the position in the tree that the new entry needs to be placed, to takes time – MadProgrammer Aug 26 '13 at 00:22
  • 1
    Today you learned how *awesomely* fast a hash map is. – Boann Aug 26 '13 at 03:09

4 Answers4

13

Unlike HashMap that re-allocates its internals as new ones get inserted, the TreeMap does not generally reallocate its nodes on adding new ones. The difference can be very loosely illustrated as that between an ArrayList and a LinkedList: the first re-allocates to resize, while the second one does not. That is why setting the initial size of a TreeMap is roughly as meaningless as trying to set the initial size of a LinkedList.

The speed difference is due to the different time complexity of the two containers: inserting N nodes into a HashMap is O(n), while for the TreeMap it's O(N*LogN), which for 1000000 nodes is roughly 20 times asymptotic difference. Although the difference in asymptotic complexity does not translate directly into the timing difference because of different constants dictated by the individual algorithms, it serves as a good way to decide which algorithm is going to be faster on very large inputs.

Sergey Kalinichenko
  • 714,442
  • 84
  • 1,110
  • 1,523
5

Am I wrong to assume a TreeMap's array's initial size should be able to be set?

Yes, that assumption is incorrect. A TreeMap doesn't have an array. A TreeMap uses binary nodes with 2 children.

If you are suggesting that the number of children in a tree node should be a parameter, then you need to figure out how that impacts on search time. And I think that it turns the search time from O(log2N) to O(log2M * log2(N/M)) where N is the number elements and M is the average number of node children. (And I'm making some optimistic assumptions ...) That's not a "win".

Is there a different reason that it is so slow?

Yes. The reason that a (large) TreeMap is slow relative to a (large) HashMap under optimal circumstances is that lookup using a balanced binary tree with N entries requires looking at roughly log2N tree nodes. By contrast, in an optimal HashMap a lookup involves 1 hashcode calculation and looking at O(1) hashchain nodes.

Notes:

  1. TreeMap uses a binary tree organization that gives balanced trees, so O(log2N) is the worst case lookup time.
  2. HashMap performance depends on the collision rate of the hash function and key space. In the worst case where all keys end up on the same hash chain, a HashMap has O(N) lookup.
  3. In theory, HashMap performance becomes O(N) when you reach the maximum possible hash array size; i.e. ~2^31 entries. But if you have a HashMap that large, you should probably be looking at an alternative map implementation with better memory usage and garbage collection characteristics.
Stephen C
  • 698,415
  • 94
  • 811
  • 1,216
4

A Treemap is always balanced. Every time you add a node to the tree, it must make sure the nodes are all in order by the provided comparator. You don't have a specified size because the treemap is designed for a smooth sorted group of nodes and to traverse through the nodes easily.

A Hashmap needs to have a size-able amount of free space for the things that you store in it. My professor has always told me that it needs 5 times the amount of space that the objects or whatever you are storing in that hashmap. So specifying the size from the initial creation of the Hashmap improves the speed of your hashmap. Otherwise, if you have more objects going into a hashmap than you planned for, the hashmap has to "size up".

(edited for spelling)

eddiecubed
  • 174
  • 1
  • 2
  • 12
  • Strange statement from your professor. The generally accepted loading is 80%, not 20%. Are you sure you have it the right way round? – user207421 Feb 24 '20 at 04:21
  • It's been 7 years since I wrote this, I don't have the notes from this class anymore. Either case is likely (professor speaking incorrectly, me flipping the statement around). At this point in my career, I wouldn't trust an old stack overflow post. I'd start with the documentation. https://docs.oracle.com/javase/10/docs/api/java/util/HashMap.html JDK uses a default load factor of .75. Where is it generally accepted to be .80? – eddiecubed Feb 26 '20 at 22:30
4

Am I wrong to assume a TreeMap's array's initial size should be able to be set?

Yes. It doesn't have an array. It has a tree.

user207421
  • 305,947
  • 44
  • 307
  • 483