I apologize if this question does not belong here, my problem is not with the code, it's with the algorithm, so perhaps it is better suited for another website, but the good people of stackoverflow never let me down.
Here is the question:
Given 2 sorted arrays A
and B
such that they have the same number of elements, lets say n
, and such that they do not share elements, and no element appears twice in the same array, find the median of the union of the arrays in logarithmic time complexity.
Very Important note: if n
is odd, then the median is the middle element. But if n
is even, the median is not the average of the middle elements. it is defined as the minimum of the middle elements.
Solution: The idea is quite simple. since they are sorted, we can find the median of A
(called med1
) and the median of B
(called med2
) in O(1)
. if med1>med2
then we know that the median of the union is an element of A
that is smaller than med1
or an element of B
that is larger than med2
, and the reverse if med2>med1
. So we throw away the redundant element and do the same process, until A
and B
are sufficiently small, say with 2 elements each, and then we just need to find the median between these 4 numbers. The median of 4 numbers would be the second minimum, since 4 is an even number, which would be O(1)
.
this is my code
#include<stdio.h>
#include<stdlib.h>
#include<conio.h>
int *scan_array(int* array_length);
int second_min_four_numbers(int a,int b,int c,int d);
int first_question(int *arr1,int *arr2,int left1,int right1,int left2,int right2);
void main()
{
int *arr1,*arr2,length_arr1=0,length_arr2=0;
printf("For the first sorted array:\n");
arr1=scan_array(&length_arr1);
printf("\nFor the second sorted array, enter %d numbers:\n",length_arr1);
arr2=scan_array(&length_arr2);
if(length_arr1==1) //edge case, arrays are length one. return the min
{
if(arr1[0] > arr2[0])
printf("The Median is %d",arr2[0]);
else
printf("The Median is %d",arr1[0]);
}
else
printf("The Median is %d",first_question(arr1,arr2,0,length_arr1-1,0,length_arr2-1));
getch();
}
int *scan_array(int* array_length) //nothing fancy. just scan the arrays.
{
int* temp,temp_length,array_element,i=0,*real_array;
temp=(int*)malloc(50*sizeof(int));
printf("Enter positive numbers. To stop enter negative or zero.\nDon't enter more than 50 numbers\n");
scanf("%d",&array_element);
while(array_element>0)
{
(*array_length)++;
temp[i]=array_element;
i++;
scanf("%d",&array_element);
}
real_array=(int*)malloc((*array_length)*sizeof(int));
for(i=0;i<*array_length;i++)
real_array[i]=temp[i];
free(temp);
return real_array;
}
int first_question(int *arr1,int *arr2,int left1,int right1,int left2,int right2)
{
int med1,med2;
if(right1-left1+right2-left2 == 2) //we are done. reached 4 elements. we will always be here for arrays larger than 1 element each
return second_min_four_numbers(arr1[left1],arr1[right1],arr2[left2],arr2[right2]);
med1=arr1[(left1+right1)/2]; //not done. find the medians in O(1).
med2=arr2[(left2+right2)/2];
if(med1 < med2)//the median of the union is somewhere between them
return first_question(arr1,arr2,(left1+right1)/2,right1,left2,(left2+right2)/2);
else
return first_question(arr1,arr2,left1,(left1+right1)/2,(left2+right2)/2,right2);
}
int second_min_four_numbers(int a,int b,int c,int d) //find second min between four numbers
{
int min=0,second_min=0; //very crude, and inefficient but simple to understand and still O(1)
min = a;
if(min > b)
min = b;
if(min > c)
min = c;
if(min > d)
min = d;
if(a == min)
{
second_min=b;
if(second_min > c)
second_min = c;
if(second_min > d)
second_min = d;
return second_min;
}
if(b == min)
{
second_min=a;
if(second_min > c)
second_min=c;
if(second_min > d)
second_min = d;
return second_min;
}
if(c == min)
{
second_min=a;
if(second_min > b)
second_min = b;
if(second_min > d)
second_min = d;
return second_min;
}
if(d == min)
{
second_min=a;
if(second_min > b)
second_min=b;
if(second_min > c)
second_min=c;
return second_min;
}
}
It is working as intended and compiles. As I said, the problem is not with my code, it's with the algorithm. Let's see an example that will demonstrate the problem:
Suppose our input was A=[1,3,5]
and B=[2,4,6]
.
Then med1=3
and med2=4
. Throw away the redundant elements and now we have A=[3,5]
and B=[2,4]
. Now we have only 4 elements overall, the data is sufficiently small, so just find the median of these 4 numbers [3,5,2,4]
. The median would be 3
, which is also the correct result for the median of the union of A
and B
, so the result is correct.
Now let's assume our input was A=[1,3,5,7]
and B=[2,4,6,8]
. med1=3
and med2=4
. Throw away the redundant elements to get A=[3,5,7]
and B=[2,4]
. Now med1=5
and med2=2
. Again throw away redundancy to get A=[3,5]
and B=[2,4]
. Now our data is sufficiently small, find the median of [3,5,2,4]
which would again give us 3
. But that result is incorrect. 3
is not the median of the union of A
and B
. The correct result would be 4
.
How can we fix this problem?