Algorithm challenge to merge sets

Question

Given n sets of numbers. Each set contains some numbers from 1 to 100. How to select sets to merge into the longest set under a special rule, only two non-overlapping sets can merge. [1,2,3] can merge with [4,5] but not [3,4]. What will be an efficient algorithm to merge into the longest set.

My first attempt is to form an n by n matrix. Each row/column represents a set. Entry(i,j) equals to 1 if two sets overlap, entry(i,i) stores the length of set i. Then the questions becomes can we perform row and column operations at the same time to create a diagonal sub-matrix on top left corner whose trace is as large as possible.

However, I got stuck in how to efficiently perform row and column operations to form such a diagonal sub-matrix on top left corner.

Can you please post first what your own attempt is to solve it? then we can help you with specific problems you have. — EkcenierK, Feb 05 '16 at 19:37
Have you seen Maximum Coverage Problem ? Your question sounds similar https://en.wikipedia.org/wiki/Maximum_coverage_problem — Some Guy, Feb 05 '16 at 20:48

score 2 · Accepted Answer · answered Feb 05 '16 at 22:30

As already pointed out in the comments (maximum coverage problem) you have a NP-hart problem. Luckily, matlab offers solvers for integer linear programming.

So we try to reduce the problem to the form:

min f*x subject to Ax<=b , 0<=x

There are n sets, we can encode a set as a vector of 0s and 1s. For example (1,1,1,0,0,...) would represent {1,2,3} and (0,0,1,1,0,0...) - {3,4}.

Every column of A represents a set. A(i,j)=1 means that the i-th element is in the j-th set, A(i,j)=0 means that the i-th element is not in the j-th set.

Now, x represents the sets we select: if x_j=1 than the set j is selected, if x_j=0 - than not selected!

As every element must be at most in one set, we choose b=(1, 1, 1, ..., 1): If we take two sets which contain the i-th element, than the i-th element of (Ax) would be at least 2.

The only question left is what is f? We try to maximize the number of elements in the union, so we choose f_j=-|set_j| (minus owing to min<->max conversion), with |set_j| - number of elements in the j-th set.

Putting it all in matlab we get:

f=-sum(A)
xopt=intlinprog(f.',1:n,A,ones(m,1),[],[],zeros(n,1),ones(n,1))

f.' - cost function as column
1:n - all n elements of x are integers
A - encodes n sets
ones(m,1) - b=(1,1,1...), there are m=100 elements
[],[] - there are no constrains of the form Aeq*x=beq
zeros(n,1), 0<=x must hold
ones(n,1), x<=1 follows already from others constrains, but maybe it will help the solver a little bit

score 0 · Answer 2 · answered Feb 05 '16 at 20:58

You can represent sets as bit fields. A bitwise and operation yielding zero would indicate non-overlapping sets. Depending on the width of the underlying data type, you may need to perform multiple and operations. For example, with a 64 bit machine word size, I'd need two words to cover 1 to 100 as a bit field.

Algorithm challenge to merge sets

2 Answers2