0

I have a set of nodes that each contain 0 or more points. There are duplicate points between nodes, but each node may contain points that are unique to that node.

For example:

  • Node A
    • Point 1
    • Point 2
    • Point 3
  • Node B
  • Node C
    • Point 1
    • Point 4
  • Node D
    • Point 2

etc.

Is there an algorithm or method to find the fewest number of nodes that would contain the most number of points, up to a specific limit?

In the above example, if I needed 4 unique points, I would get Node A and Node C, OR Node A and Node D.

Presently, I am solving this problem by sorting the list of nodes in descending order by the number of points (so Node A, Node C, Node D) and discarding nodes that have no points (Node B). I am then iterating over that list of nodes, counting unique points (and recording what Nodes are looked at), until I hit a defined threshold of unique points. So, in the above example, my result would be Node A and Node C.

For what it's worth, I'm doing this in Javascript, but I think my question is more a "how to solve the problem" and not related to a specific language. Apologies if this is the incorrect place to post.

Bergi
  • 630,263
  • 148
  • 957
  • 1,375
Tim AtLee
  • 917
  • 1
  • 10
  • 17
  • If all you have is the data your method is about the best that can be done, though I would count unique points befor the sort and then sort on number of unique points. The effectiveness will off course depend on the data. If you have control of the generation of that data you can optimise the search by keeping list for each point of the nodes it is in and also adding an additional count of unique points in the node. – Blindman67 Oct 12 '16 at 16:52
  • So you already found a solution that works. What do you want to improve? Elegance? Efficiency? Speed? How often do you need to do the operation (on the same nodes? with the same limits? etc)? What is your use case? – Bergi Oct 12 '16 at 17:01
  • Your statement about getting 4 unique points from "Node A and Node D" makes me think that perhaps there's something unsaid about the problem. The number of unique points within Node A and Node D is 3, since Point 2 is duplicated... – Heretic Monkey Oct 12 '16 at 17:42

2 Answers2

1

From what i can see, without the limit, a reduction of Set Cover to your problem should be trivial. Your limit is not specified, so it could as well span all possible points. As such, brute force is the only viable option. Note that even should the limit be further specified, i'd still guess it is NP-complete.

Sorting should not do the trick: The first n nodes after sorting could have many duplicate points, making it "better" to include nodes that have less points each.

ASDFGerte
  • 4,695
  • 6
  • 16
  • 33
  • I don't think it's related to the set cover problem, as the OP doesn't necessarily want to cover the whole universe. – Bergi Oct 12 '16 at 17:28
  • @Bergi as written, "without the limit". He didn't further add constraints on his limit. E.g. in his example, if the limit is four, that is equivalent to all points being included. Then in wikipedia's words, the universe is all points, the collection of sets is the set of the nodes. Searched is the minimum amount of nodes that have the maximum amount of unique points, which then is everything and equivalent to the union of the nodes being the universe. With a limit, e.g. 12, i only dare guess it is NP-complete. – ASDFGerte Oct 12 '16 at 17:43
  • Ah, you're right. But I think it's only the limit that makes the problem really interesting :-D – Bergi Oct 12 '16 at 17:54
  • 1
    OK, I found the name of the problem: *k-partial set cover*, like discussed [here](http://crab.rutgers.edu/~rajivg/publications/journal/jalg04.pdf) – Bergi Oct 12 '16 at 18:22
0

You could build all possible combinations and sum the values, filter the target value and order the result set to the wanted constraints.

Take the upper items as result.

var array = [3, 0, 2, 1],
    i,
    result = [],
    values,
    sum;

// filter zero values
array = array.filter(function (a) {
    return a;
});

// generate all possible combination and build sum
for (i = 0; i < 1 << array.length; i++) {
    sum = 0;
    values = array.filter(function (a, j) {
        if (i & (1 << j)) {
            sum += a;
            return true;
        }
    });
    // add only relevant items to the result set
    sum >= 4 && result.push({ values: values, sum: sum });
}

// sort by priority
result.sort(function (a, b) {
    return a.values.length - b.values.length || a.sum - b.sum;
});

console.log(result);
.as-console-wrapper { max-height: 100% !important; top: 0; }
Nina Scholz
  • 376,160
  • 25
  • 347
  • 392
  • When taking the sum, you have to filter out duplicates (a point can be part of multiple nodes). But either way, that sounds much more inefficient than what the OP is currently doing. – Bergi Oct 12 '16 at 17:16