Quickly finding if there are 2 or more equal numbers

Question

I have an array of N different numbers that change frequently. After every change there is a chance that two or more numbers have become equal, and I don't want that. The number N can be as big as the maximum possible integer. Knowing that changes happen frequently, I don't want to compare each number with each one of the rest after each change.

How can I quickly find if there are at least 2 equal numbers in the array?

Or if you can't sort the data for whatever reason (like you need the data to be in the order in which it was added to the array), then maintain a hash table of the numbers in the array and check to see if a number is in the hash table before you add it to the array — Zim-Zam O'Pootertoot, Jun 17 '13 at 16:25
Hashing actually sounds like the smarter variant, given that binary trees have that log n there, and hashing structures usually are implemented to have contains() delete() add() in constant time. — G. Bach, Jun 17 '13 at 16:53
Do you know which elements (or how many?) changed after each change? Otherwise you would end up having to hash the entire set every time. — imreal, Jun 17 '13 at 17:49

sds · Accepted Answer · 2013-06-18T20:17:26.020

It really depends on what other constraints you have, e.g.:

Do you need to maintain the order in which the number come in?
Are the numbers are only ever added, or are they deleted too?
What is a more common operation: add/delete or check for dupes?
What do you need to keep - the set (i.e., unique numbers) or the multiset (numbers with their multiplicities)?

There are two basic options: a Binary Search Tree and a Hash Table.

The former will give you O(log(n)) operations on average, the latter - O(1); the actual results will depend on what kind of stream you have (are the numbers random? increasing? follow a weird non-obvious pattern?)

If you decide to go for BST, remember that you will have to keep it balanced.

Example (untested)

(defparameter *my-data-array* (make-array 100000))
;; fill *my-data-array*
(defparameter *my-data-table*
  (let ((ht (make-hash-table)))
    (loop for v across *my-data-array*
        do (incf (gethash v *my-data-table* 0)))
    ht))
(defun modify-data-array (pos new-value)
  (let* ((old-value (aref *my-data-array* pos))
         (old-count (decf (gethash old-value *my-data-table*)))
         (new-count (incf (gethash new-value *my-data-table* 0))))
    (setf (aref *my-data-array* pos) new-value)
    (case old-count
      (0 ; old-value is now not in the array
       ...)
      (1 ; old-value is now unique
       ...)
      (t ; old-value is still not unique
       ...))
     (case new-count
      (1 ; new-value was not in the array before
       ...)
      (2 ; new-value was unique before, but not anymore
       ...)
      (t ; new-value was not unique
       ...))))

I have an array of N different numbers that CHANGE frequently- the size of the array stays the same at all times and the elements are NOT sorted (I need them this way). For now I'm thinking for the BST algorithm. — AlexSavAlexandrov, Jun 18 '13 at 19:49
@AlekZanDer/: so the array is fixed and filled at all times and its elements change all the time? I suggest a hash table number->count instead of the BST (remember that you will have to rebalance the tree - check out AVL trees). — sds, Jun 18 '13 at 19:57

score 0 · Answer 2 · answered Jun 18 '13 at 05:46

As a variant you might use Bloom filter. It allows to test whether a given number is already added or not. But there may be false positive errors. In other hand, bloom filters are space-effective and fast, and allow you to keep your array. Bloom filter algorithm will be useful for you if you take repetitive numbers rare, otherwise you have to retest numbers in linear time too often.

Quickly finding if there are 2 or more equal numbers

2 Answers2

It really depends on what other constraints you have, e.g.:

There are two basic options: a Binary Search Tree and a Hash Table.

Example (untested)