4

I have come up with divide and conquer algorithm for this. Just wanted to know if this would work or not?

First mid is calculated from the integer range i.e. (0+(1<<32-1))>>1 and then this idea is applied: range of number from start to mid or from mid to end will always be less than the numbers we are going to consider as we are considering billion numbers and there will definitely some numbers which are going to be repeated as the range of 32bit integer is much smaller compare to billion numbers.

def get_duplicate(input, start, end):  
  while True:
    mid = (start >> 1) + end - (end >> 1)
    less_to_mid = 0
    more_to_mid = 0
    equal_to_mid = 0
    for data in input:
        data = int(data, 16)
        if data < mid:
            less_to_mid += 1
        elif data == mid:
            equal_to_mid += 1
        else:
            more_to_mid += 1
    if equal_to_mid > 1:
        return mid
    elif mid-start < less_to_mid:
        end = mid-1
    elif end-mid < more_to_mid:
        start = mid+1

with open("codes\output.txt", 'r+') as f:
  content = f.read().split()
  print(get_duplicate(content, 0, 1<<32-1))

I know we can use bit array but I just want to get your views on this solution and if implementation is buggy.

noman pouigt
  • 906
  • 11
  • 25

2 Answers2

2

Your method is OK. But you will probably need to read the input many times to find the answer.

Here is a variant, which allows you to find a duplicate with few memory, but you only need to read the input twice.

  1. Initialize an array A[65536] of integers to zero.
  2. Read the numbers one by one. Every time a number x is read, add 1 to A[x mod 65536].
  3. When the reading ends, there will be at least one i such that A[i] is strictly bigger than 65536. This is because 65536 * 63356 < 4.3 billion. Let us say A[i0] is bigger than 65536.
  4. Clear the array A to zero.
  5. Read the numbers again, but this time, only look at those numbers x such that x mod 65536 = i0. For every such x, add 1 to A[x / 65536].
  6. When the reading ends, there will be at least one j such that A[j] is strictly bigger than 1. Then the number 65536 * j + i0 is the final answer.
WhatsUp
  • 1,618
  • 11
  • 21
  • I think just one 32 bit integer is enough to find out the duplicates right? Just have set and test method, before you do set just test if that particular bit is set or not. If set then that is the answer and if not set then just set it. Why 500M ? – noman pouigt Feb 16 '16 at 22:46
  • 1
    @noman pouigt: one 32 bit integer is not sufficient. You need 4.3billion bits to track as many numbers. Just because 32 bits can have 4.3 billion states, it does not mean it can track state of 4.3 billion items – Vikhram Feb 16 '16 at 22:59
  • One `i` would be strictly bigger _if_ "all" 32 bit integer values were present at least once - which the problem statement doesn't mention. Watch out for one value appearing more than 1<<32 times, too. – greybeard Feb 17 '16 at 07:59
  • @greybeard in that case we can just stop if it is equal to 65537 or we can check for overflow and discard further additions. No? – noman pouigt Feb 17 '16 at 09:31
0

2^32 bits memory is nothing special for the modern systems. So you have to use bitset, this data structure needs only a bit per number and all modern languages have an implementation. Here is the idea - you just remember if a number has been already seen:

void print_twice_seen (Iterator &it)//iterates through all numbers
{
  std::bitset< (1L<<32) > seen;//bitset for 2^32 elements, assume 64bit system

  while(it.hasNext()){
       unsigned int val=it.next();//return current element and move the iterator
       if(seen[val])
           std::cout<<"Seen at least twice: "<<val<<std::endl;
       else
           seen.set(val, true);//remember as seen
  }
}
ead
  • 32,758
  • 6
  • 90
  • 153
  • it is mentioned in the question that I know about bit set method. Suppose bit set method can't be used because of memory. Is there any other method or if my method would work? – noman pouigt Feb 16 '16 at 18:35
  • @noman pouigt sorry somehow overlooked the last sentence. But bitset is not a problem for 2^32 numbers it only needs about 0.5G – ead Feb 16 '16 at 19:34