By now many of you must have heard about HashDoS. The researchers who found this, claim in their video that the worst case complexity of Hastable is O(n^2)
. How can this be?
Asked
Active
Viewed 2,098 times
2

AppleGrew
- 9,302
- 24
- 80
- 124
-
possible duplicate of [Time complexity of Hash table](http://stackoverflow.com/questions/3949217/time-complexity-of-hash-table) – Raymond Chen Dec 30 '11 at 07:41
-
1I do not think this is a duplicate. The question is about O(n^2) which has not been addressed in the previous question. – Mike Nakis Dec 30 '11 at 07:45
-
1It's not a duplicate, it's simply a case of someone not reading/understanding the material they're asking about. Mike is correct below - it's O(n) for inserting any one element and O(n^2) for inserting a *set of n elements* (if you're creating collisions). This is exactly what they state and have on their slides. – Brian Roach Dec 30 '11 at 07:55
-
It's not an exact duplicate, but the answer also answer this question. If each operation is O(n) and you perform n operations, then the total time is O(n²). – Raymond Chen Dec 31 '11 at 00:40
1 Answers
8
The question is worded in an incorrect way. The researchers do not claim that "the worst case complexity of Hashtables is O(n^2)".
What they claim is that "The [...] complexity of inserting n elements into the table [...] goes to O(n^2)." So, the complexity of a single operation is O(n). Which makes sense: if all keys have the same hash, then they all go into the same bucket, which is just an array or a linked list, so it needs to be searched linearly.

Mike Nakis
- 56,297
- 11
- 110
- 142
-
Thus, that claim merely serves to highlight the importance of using a good hash function. A good hash function stands a better chance of reaching amortized O(1) time than a weaker hash function; for the former, only for very few inputs will the hash table reach the worst case of O(n). – Peter O. Dec 30 '11 at 07:57
-
@PeterO. Actually that will not help here. The attacker knows ahead of time what data to send to generate the collisions -- as they have access to said hash functions/libraries -- and sends the data that creates such collisions (keys for POST parameters in this case). A randomizing hash implementation prevents the attacker from being able to pre-compute this list of colliding keys, and limiting the number of keys allowed mitigates extreme `n^2`. – Dec 30 '11 at 07:59
-
@MikeNakis: In that case, it's a problem that no hash function can solve: hash collisions are inevitable for any hash code of finite length. – Peter O. Dec 30 '11 at 08:01
-
@PeterO. The problem can be solved by introducing a magic number that hashable objects include in their hashcode computations, so that every web site out there has a different, secret magic number, thus preventing an outsider from coming up with strings whose hashcodes are identical. – Mike Nakis Dec 30 '11 at 08:15
-
1@MikeNakis I am sorry that my question didn't make any sense, but anyway you still got my question, but, I could not understand your explanation, sorry, but it seems you are merely restating their statement. Let me explain it verbosely. I am talking about worst case, so it is evident that all the keys generate the same hash. To improve the access time typically the items in a bucket are sorted. So, 'worst case' hash is actually like a sorted array. So, do you mean that in this case it is not possible to employ a sorting algorithm which can sort in `O(n log n)`? – AppleGrew Dec 30 '11 at 09:53
-
1@AppleGrew I items in the buckets are not sorted, because they cannot be sorted. You cannot sort by hashcode, because all items in a bucket by definition have the same hashcode, and you cannot sort by value, because a hashmap cannot demand that the values you put in it be comparable objects. – Mike Nakis Dec 30 '11 at 10:20
-
@AppleGrew also, I did not say that your question didn't make any sense, I only said that the sentence "the worst case complexity of Hashtables is O(n^2)" does not make sense, because computational complexity is a feature of an operation that can be performed on a data structure, not a feature of the data structure itself. In any case, I admit that this was unnecessarily harsh on my behalf; after all, it is true that your wording is in frequent colloquial use, and it usually refers to the operation which can be inferred by context. So, I will reword my answer. – Mike Nakis Dec 30 '11 at 12:59
-
1@MarkNakis Sorry, I too was harsh. Chains in a bucket are usually implemented by linked list, right? If so then inserting n elements should take constant time (adding the elements to the head). So, it is again `O(n)`. I still do not see how we arrive at `O(n^2)`. Can you please point to me to some real code for hastable implementation? – AppleGrew Dec 30 '11 at 14:20
-
1@AppleGrew Well, insertion is not a matter of just placing an item in the bucket, because you first have to see if you already have the item, and if so, replace its entry. So, you have to traverse the bucket. The only implementation of a hashset that I have ever seen is the hashset of the CLI, (C#,VB,etc), which you can download from microsoft, (search for SSCLI and/or "Rotor",) but it is **huge**, so making sense out of it is no walk in the park. – Mike Nakis Dec 30 '11 at 14:40
-