4
HashMap: intialCapacity=1000; loadFactor=0.75; 

The above means, that the HashMap will re-size around 1000*75 = 750 th entry to 2000. Would rehashing take place at this time? If yes, then how will the performance be affected? If not, then when? at MAX_CAPACITY?

TreeMap: No rehashing, but sorting. Documentation suggests that insertion/reading/search is always O(log N). However, isnt sorting/new-entry/delete-entry always re-sizes the entire TreeMap?

How are the two compared in terms of BigO notation for the above scenarios and overall performance?

HashMap and ConcurrentHashMap are highly used implementations but TreeMap is not that much used in comparison. I agree on a TreeMap that only adds and seldom deletes but highly searched to be preferably over HashMap/table implementations.

Any comment is appreciated.

EDIT: In terms of data-structure amortization, what are the performance worst cases for best practices that should be taken into account? Like rehashing of a Hash based MAP and/or resizing of a tree based Map or set. There are certain trade-offs but assuming that datastructure is constantly pressed for modification due to highly un-predictable throughput.

Ashley
  • 629
  • 3
  • 6
  • 16
  • 1
    Note that in terms of Big O we don't consider things like array resizing and tree rebalancing because it is a way to generalize the type of algorithm used, not represent exact performance. Insertion is always O(1) for a hash table and O(log(n)) for a red-black tree. – Radiodef Mar 07 '14 at 23:09
  • @Radiodef, Understood. However, that's a piece that I am looking for an answer. Big-O is under best scenario for an algo, but, in reality, it isnt ideal. What happens when a hashmap is rehashing and/or a treemap is re sizing to access/insert/search time? Is that my worst case? How much worse? Isnt sorting or usage of a comparator /comparable taking affect? when? – Ashley Mar 07 '14 at 23:41
  • Well the important thing is that as far as performance, a hash table will outperform any other data structure for access. In the case where you *need* sorting, a red-black tree will outperform any other data structure at that. When resizing/reordering stuff happens of course you get a small chug but most data structures need to do something like this from time to time. – Radiodef Mar 08 '14 at 00:04
  • 1
    BTW [measurements have been made before on HashMap parameters](http://stackoverflow.com/questions/7115445/what-is-the-optimal-capacity-and-load-factor-for-a-fixed-size-hashmap). They are very difficult to quantify exactly. – Radiodef Mar 08 '14 at 00:06
  • This is a great link. Thanks for the info. – Ashley Mar 08 '14 at 00:10

1 Answers1

4

The above means, that the HashMap will re-size around 1000*75 = 750 th entry to 2000.

'Approximately twice the number of buckets' is what the Javadoc says. You're adding precision that isn't warranted.

Would rehashing take place at this time?

Yes, according to the Javadoc. You don't seem to have read it.

If yes, then how will the performance be affected?

The Javadoc says there will be a rehash of the entire HashMap. This is O(N) of course but it only happens occasionally so it is asymptotically zero.

If not, then when? at MAX_CAPACITY?

See above.

TreeMap: No rehashing, but sorting.

No rehashing, and no sorting either. Just maintenance of an ordered data structure. It isn't the same thing.

Documentation suggests that insertion/reading/search is always O(log N).

The documentation specifies that. It isn't just a suggestion.

However, isnt sorting/new-entry/delete-entry always re-sizes the entire TreeMap?

No, because it isn't held in an array. The Javadoc says it is implemented as a Red-black search tree.

How are the two compared in terms of BigO notation for the above scenarios and overall performance?

As documented. HashMap is O(1) and TreeMap is O(log N).

I agree on a TreeMap that only adds and seldom deletes but highly searched to be preferably over HashMap/table implementations.

I don't. It isn't.

user207421
  • 305,947
  • 44
  • 307
  • 483
  • your answers are precise, however, I guess, I am trying to look for an answer on performance affects due to rehashing. Rehashing is a costly operation, whether or not the O(1) performance on a HashMap, but what happens when an put operation is called precisely at the same time when rehashing is happening? Is it still O(1)? How to avoid re-hashing completely? Have a fixed size of HashMap knowing max throughput? Can you pls. explain "I don't. It isn't." Thanks – Ashley Mar 07 '14 at 23:38
  • 2
    Rehashing takes place as the *result* of a put. If you mean two concurrent puts, the behaviour is undefined anyway regardless of rehashing. You can avoid rehashing by making the initial size large enough, but instead of worrying overmuch about it I would first do some *measurements* to see whether you even have a problem in the first place. HashMap could probably rehash a million entries in a small fraction of a second. I don't agree that a TreeMap is preferable to a HashMap in any circumstance where you don't require ordering, unless the key hashCodes are degenerate. – user207421 Mar 07 '14 at 23:43
  • Thanks again for the details. In my case, when I use a HashMap for a high throughput, say 50K entries per second, re-hashing is not desirable and a potential GC run due to copy over of a new Entry class with a new size is also potentially adding to the pause time of GC. So, I am trying to find a balance where I can see that my max entries are say, 200K, at any given time, then would you agree that a rehash wont happen? For treemap, isnt O(log n) generally faster than O(1)---and I know O(1) varies from scenarios to scenario. – Ashley Mar 07 '14 at 23:53
  • I think we cant avoid rebalancing of a treeemap but re-hashing of a hash based map. Would you agree? – Ashley Mar 07 '14 at 23:56
  • 1
    O(log N) cannot possibly be 'generally faster than O(1)' unless log N < 1. I don't know what you mean by 'new Entry class with a new size'. There is an Entry class per, um, entry, regardless of rehashing. There is an *array* of Entry classes, is that what you mean? You can avoid rehashing by the means I mentioned above. GC is concurrent these days. I haven't seen a GC pause this century. – user207421 Mar 07 '14 at 23:58