I have come up with divide and conquer algorithm for this. Just wanted to know if this would work or not?
First mid is calculated from the integer range i.e. (0+(1<<32-1))>>1 and then this idea is applied: range of number from start to mid or from mid to end will always be less than the numbers we are going to consider as we are considering billion numbers and there will definitely some numbers which are going to be repeated as the range of 32bit integer is much smaller compare to billion numbers.
def get_duplicate(input, start, end):
while True:
mid = (start >> 1) + end - (end >> 1)
less_to_mid = 0
more_to_mid = 0
equal_to_mid = 0
for data in input:
data = int(data, 16)
if data < mid:
less_to_mid += 1
elif data == mid:
equal_to_mid += 1
else:
more_to_mid += 1
if equal_to_mid > 1:
return mid
elif mid-start < less_to_mid:
end = mid-1
elif end-mid < more_to_mid:
start = mid+1
with open("codes\output.txt", 'r+') as f:
content = f.read().split()
print(get_duplicate(content, 0, 1<<32-1))
I know we can use bit array but I just want to get your views on this solution and if implementation is buggy.