Is finding a pair of equal integers in an array O(n)?

Question

Given an array of integers what is the worst case time complexity that would find pair of integers which are same ?

I think this can be done in O(n) by using counting sort or by using XOR . Am i right ?

Question is not worried about space complexity and answer says O(nlgn).

Counting sort requires constrained input. Although hashing can be used, the worst case time complexity won't be O(n). — Nelfeal, Aug 30 '16 at 18:05
@ moooeeeep world is not perfect as one thinks, which makes it interesting — Pavneet_Singh, Aug 30 '16 at 18:05
@Nelxiost, with hashing (additional O(n) memory), this should be doable in O(n) worst-case time. — Kedar Mhaswade, Aug 30 '16 at 18:07
@KedarMhaswade No, that's the average complexity. Worst-case search or insert for hash tables is O(n) because of collisions (and maybe rehashing). — Nelfeal, Aug 30 '16 at 18:09
Counting sort needs another O(n) memory. I'd prefer a hash table. — Kedar Mhaswade, Aug 30 '16 at 18:09
Well, with reasonable load factor, hash function and integer keys, collisions should be rare. — Kedar Mhaswade, Aug 30 '16 at 18:11
@Willturner -- check if an integer already exists in the table, if not, add it, if exists, return the duplicate. — Kedar Mhaswade, Aug 30 '16 at 18:13
@KedarMhaswade Of course. That's the point of hashing. The thing is, complexity is a theory, and the question is (apparently) about worst-cases. — Nelfeal, Aug 30 '16 at 18:14
I think the question is under specified. It should state whether additional memory is available. I believe in practice, this can be solved in O(n) asymptotic time using a hash table. Are you worried about hash functions? — Kedar Mhaswade, Aug 30 '16 at 18:15
@KedarMhaswade That's not the point. The problem can indeed be solved in O(n) average time with a hash table, there is no denying that. But the worst-case time will not be linear in the general case, whatever the hash function. Then again, I am not sure Willturner knows the difference, since he apparently doesn't know how to use hash tables. — Nelfeal, Aug 30 '16 at 18:19
Actually answer says O(nlogn) . i dont think so. moreover, i dont think question is worried about extra space — Garrick, Aug 30 '16 at 18:20
Well, if answer says O(nlogn), you could actually sort by any standard comparison sorting algorithm and compare consecutive elements. — Rishit Sanmukhani, Aug 30 '16 at 18:26
Who is answer? What are you actually asking? If your professor / course / whatever resource is wrong? They are not. You cannot solve this in under `O(n log n)` for the general case. If you know, however, that all values are in the range `[0, m]` you can solve this in linear time and `O(m)` space by simply counting occurrences. — Vincent van der Weele, Aug 30 '16 at 18:28
A radix sort can be done in place with complexity of O(kn) where k is some constant based on the number of bits in an integer (so still O(n) complexity). After the radix sort is done, a pass can be made through the array to find a matching pair (if one exists). Total worst case complexity is still O(n). — Michael Burr, Aug 30 '16 at 18:29
@MichaelBurr you really cannot use the assumption that the size of an integer is bounded in asymptotic complexity. That would mean that every algorithm on an array of integers runs in `O(1)` time because the size and the values are bounded by machine limitations and thus you can enumerate all possible solutions. — Vincent van der Weele, Aug 30 '16 at 18:31
@VincentvanderWeele: then if additional space isn't a concern the counting sort algorithm should be acceptable - all it takes to to know the range the set of values in is a single pass through the input array. Once you know the lower & upper bounds, you can do the counting sort. — Michael Burr, Aug 30 '16 at 18:45

Nelfeal · Accepted Answer · 2016-08-30T19:01:00.143

3

Counting sort

If the input allows you to use counting sort, then all you have to do is sort the input array in O(n) time and then look for duplicates, also in O(n) time. This algorithm can be improved (although not in complexity), since you don't actually need to sort the array. You can create the same auxiliary array that counting sort uses, which is indexed by the input integers, and then add these integers one by one until the current one has already been inserted. At this point, the two equal integers have been found.

This solution provides worst-case, average and best-case linear time complexities (O(n)), but requires the input integers to be in a known and ideally small range.

Hashing

If you cannot use counting sort, then you could fall back on hashing and use the same solution as before (without sorting), with a hash table instead of the auxiliary array. The issue with hash tables is that the worst-case time complexity of their operations is linear, not constant. Indeed, due to collisions and rehashing, insertions are done in O(n) time in the worst case.

Since you need O(n) insertions, that makes the worst-case time complexity of this solution quadratic (O(n²)), even though its average and best-case time complexities are linear (O(n)).

Sorting

Another solution, in case counting sort is not applicable, is to use another sorting algorithm. The worst-case time complexity for comparison-based sorting algorithms is, at best, O(n log n). The solution would be to sort the input array and look for duplicates in O(n) time.

This solution has worst-case and average time complexities of O(n log n), and depending on the sorting algorithm, a best-case linear time complexity (O(n)).

edited Aug 30 '16 at 19:01

answered Aug 30 '16 at 18:39

Nelfeal

12,593
1
20
39

Here, no condition is given on size of integers or the bound , So i can say merge sort is the best algorithm. ? – Garrick Aug 30 '16 at 18:43
The worst-case time complexity of quick sort (a comparison sort) is O(n^2): https://en.wikipedia.org/wiki/Quicksort#Worst-case_analysis – Kedar Mhaswade Aug 30 '16 at 18:44
@KedarMhaswade I think you misread (I edited). I'm pointing out that the worst-case time complexity is at best O(n log n). Sure, some algorithms have O(n²) or worse, but, for example, [smoothsort](https://en.wikipedia.org/wiki/Smoothsort) has O(n), O(n log n) and O(n log n) time complexities (best-case, average and worst-case, respectively). – Nelfeal Aug 30 '16 at 18:54
@Willturner Some algorithms are better than others, but you cannot say that merge sort is the best one. You can have a look at [this list](https://en.wikipedia.org/wiki/Sorting_algorithm#Comparison_of_algorithms) ; some algorithms have better time complexities than merge sort, but might perform worse in practice. The same goes for quicksort, heapsort, etc. – Nelfeal Aug 30 '16 at 18:59
what about this method http://www.geeksforgeeks.org/find-duplicates-in-on-time-and-constant-extra-space/ – Garrick Aug 31 '16 at 02:47
@Willturner That has the same requirements as the counting sort solution (even tighter, in fact). – Nelfeal Aug 31 '16 at 04:00
@Nelxiost I read about hash tables and in your answer you have written that in the worst case it can be O(n^2). I got that. But just a simple doubt, what if we used amortized hash table. Then, finding a pair can be done in O(N). Then, we don't have to worry about hash table worst case. This is also optimal, than comparison based sorting. – Garrick Sep 08 '16 at 18:23
An amortized hash table doesn't make sense. You can talk about amortized complexity (in time or space), as in "amortized O(1)". You can't talk about amortized containers, that doesn't mean anything. A hash table always has amortized O(1) insertion and lookup time complexities. [You can read more about that here](http://stackoverflow.com/questions/3949217/time-complexity-of-hash-table). – Nelfeal Sep 08 '16 at 18:35

score 2 · Answer 2 · answered Aug 30 '16 at 18:31

Following is the pseudo code for Counting Sort:

#    input -- the array of items to be sorted; key(x) returns the key for item x
#    n -- the length of the input
#    k -- a number such that all keys are in the range 0..k-1
#    count -- an array of numbers, with indexes 0..k-1, initially all zero
#    output -- an array of items, with indexes 0..n-1
#    x -- an individual input item, used within the algorithm
#    total, oldCount, i -- numbers used within the algorithm

# calculate the histogram of key frequencies:
for x in input:
    count[key(x)] += 1

# calculate the starting index for each key:
total = 0
for i in range(k):   # i = 0, 1, ... k-1
    oldCount = count[i]
    count[i] = total
    total += oldCount

# copy to output array, preserving order of inputs with equal keys:
for x in input:
    output[count[key(x)]] = x
    count[key(x)] += 1

return output

As you can observe, all the keys are in the range of 0 ... k-1. In your case number itself is the key, and it has to be in certain range for counting sort to be applicable. Only then it can be done in O(n) with O(k) space.

Otherwise, solution is O(nlogn) using any comparison based sorting.

score 1 · Answer 3 · answered Aug 30 '16 at 18:34

1

If you subscribe to integer sorts being O(n), then by all means this is O(n) by sorting + iterating until two adjacent elements compare equal.

Hashing is actually O(n²) in the worst case (you have the world's worst hashing algorithm that hashes everything to the same index). Although in practice using a hash table to get counts will give you linear time performance (average case).

In reality, linear time integer sorts "cheat" by fixing the number of bits used to represent an integer as some constant k that can then be ignored later. (In practice, though, these are great assumptions and integer sorts can be really fast!)

Comparison-based sorts like merge sort will give you O(n log n) complexity in the worst case.

The XOR solution you speak of is for finding a single unique "extra" item between two otherwise identical lists of integers.

answered Aug 30 '16 at 18:34

AndyG

39,700
8
109
143

Can you please elaborate XOR line (Last line) – Garrick Aug 30 '16 at 18:40
Sure, the general outline is that you have two lists of integers that are identical save for the fact that one of them has a single unique element. If you start with 0 and xor every element in both lists, they'll all cancel each other out except for the unique element. – AndyG Aug 30 '16 at 18:43
@AndyG There is also [Cuckoo hashing](https://en.wikipedia.org/wiki/Cuckoo_hashing) delivering *expected constant time* for insertions. But it's much harder to analyse and probably also use some *tricks/cheats* as you describe with counting-based sorting. – sascha Aug 30 '16 at 23:19
@sascha: Thanks for that, it's a new one on me. It appears that Cuckoo hashing is a randomized algorithm, which uses probability to get a good average time performance. Of course, due to randomness, there's always a chance of a worst case scenario, so the Big-O doesn't change, but the chance of a "bad" case becomes vanishingly small. – AndyG Aug 31 '16 at 13:08
@AndyG It's an amortized analysis of course. So the expected constant time operation of adding is already averaged over all cases (including worst-case). A slow worst-case is possible, but this means, that the other inserts have to be much faster. – sascha Aug 31 '16 at 13:17

Is finding a pair of equal integers in an array O(n)?

3 Answers3

Linked