0

I am trying to merge k sorted array of structs into a single one. I know the algorithm of using a min heap to merge the arrays. I am using priority_queue in C++ to implement the heap. My code looks like below.

struct Num {
    int key;
    int val;
}

// Struct used in priority queue.
struct HeapNode
{
    Num num;              // Holds one element.
    int vecNum;           //Array number from which the element is fetched.
    int vecSize;          // Holds the size of the array.
    int next;             // Holds the index of the next element to fetch.
};

// Struct used to compare nodes in a priority queue.
struct CompareHeapNode  
{  
    bool operator()(const HeapNode& x, const HeapNode& y)  
    {  
        return (x.num.key < y.num.key) || ( (x.num.key == y.num.key)&&(x.num.val < y.num.val) ); 
    } 
}; 

vector<vector<Num>> v;
priority_queue< HeapNode, vector<HeapNode>, CompareHeapNode> p_queue;

//Insert the first element of the individual arrays into the heap.

while(!p_queue.empty())  
{  
    Num x = p_queue.top();
    cout << x.num.key << ' ' << x.num.val << '\n';
    p_queue.pop();

    if(x.next != x.vecSize) {
        HeapNode hq = {v[x.vecNum][x.next], x.vecNum, x.vecSize, ++x.next};
        p_queue.push(hq);
    }  
}

Let's consider 3 sorted arrays as shown below.

Array1:             Array2:         Array3:
0 1                 0 10            0 0
1 2                 2 22            1 2
2 4                 3 46            2 819
3 7                 4 71            3 7321

Now the problem is there can be some elements common among the arrays as show above. So while merging the arrays, duplicate values appear in the sorted array. Are there any ways to handle duplicate keys?

Pattu
  • 3,481
  • 8
  • 32
  • 41
  • You already handle them. How would you like to handle them _differently_? (make a SSCCE and print expected outcome...) – sehe Nov 12 '15 at 11:05
  • You want only unique values? – Surt Nov 12 '15 at 11:06
  • *duplicate values appear in the sorted array* -- Sounds like you should be using a `std::set`, where the items are already sorted and unique, not a vector or array. – PaulMcKenzie Nov 12 '15 at 11:07

1 Answers1

1

So your question is that is there a way to check if the value you were inserting into the list were already in the list. Only if you could check that.

One solution is to use a hash table (unordered_set). Before inserting, check if element exists in it. If not, then insert that element in list and hash table.

But you can do better. Since you are merging sorted arrays, the output is also sorted. So, if duplicates exists, they will be together in the output array. So, before inserting, check the value with the last value of the output.

therainmaker
  • 4,253
  • 1
  • 22
  • 41
  • Removing duplicates from the final merged array seems like a nice approach. But I am dealing with huge amount of data. So I don't want to add any additional complexity. – Pattu Nov 12 '15 at 11:20
  • 1
    @Pattu : To clarify, you aren't removing from the final merged array. My point was that since duplicates would always be together, you only need to check the last element of the output list to check for duplicates. If the value to insert and value at end are equal, don't insert. – therainmaker Nov 12 '15 at 11:22