-1

This isn't homework, but a question I encountered during my research. I need to know whether this problem is NP-hard or not. In the first case, I require an approximate algorithm and in the latter case an efficient one providing me with the optimal solution.

Informal description:

Imagine p persons using some of t tools. Every person uses only a couple of tools, but not all. Someone writes down who used which tool. Question: How-to find the largest group of persons, for which each person used at least k tools that everyone else uses too? [prior problem description: the same tools as everyone else?] The number of tools is restricted to t'

I developed a formal description of this problem, which might help:

Let G=(P,T,E) be a bipartite graph in which P represents the set of people and T the set of tools. There is an edge between a node p in P and t in T if the person used that tool. The goal is to find the sets P', T' for which the following conditions apply: 1) From any p' in P', any t' in T' can be reached with a single edge. 2) |P'|, i.e. the number of nodes in P', is maximum.

An inefficient approach would be to take each subset P' and calculate the intersection of each t' associated with a p' in P'. Unfortunately, the number of such subsets grows exponentially and the calculation becomes soon intractable.

Thank you very much!

Chris
  • 721
  • 1
  • 10
  • 23
  • I think your formal and informal descriptions are not equal. Informal: `p1` used `t1` and `t2` and so do `p2` but `p3` dont use any tools. So `p1` and `p2` are a group of persons that use the same tools and it is max. since `p3` dont use any tool. but in the formal definition you only have to satisfy that there is an edge form `t1` to `t2`, which is given by `t1-p1-t2` so we can add `p3` to the group and constraint 1 is still satisfied but the group is bigger. – AbcAeffchen Jun 27 '14 at 19:43
  • @AbcAeffchen: You're absolutely right, thanks. I changed the condition in the formal description. – Chris Jun 27 '14 at 20:10
  • P' = P, T' = empty is optimal for the formal description. The informal description is solved by iterating over the people, counting tool sets and then picking the largest one. – Paul Hankin Jun 28 '14 at 10:12

2 Answers2

1

To find the largest group of persons, for which each person uses the same tools as everyone else, you'll just need to group persons by the set of tools they use.

In other words:

  • Create a map: from (set of tools) to (count of persons using this set of tools)
  • Find the set of tools with the highest count.

This is definitely polynomial.

For example:

Suppose out tool set is {Claw Hammer, Tape Measure, Utility Knife, Moisture Meter, Chisel, Level, Screwdriver, Nail Set, Sliding Bevel, Layout Square} (source)

We'll create a map from a bit-set (expressed as an integer of as a string) to an integer (count of persons using this set of tools).

Now, if Dan's tools are {Claw Hammer, Utility Knife, Sliding Bevel}, we'll add to following our map:

key: 1010000010, value: 1.

For adding another person, we'll first calculate the key. If Dave uses the same tools as Dan, we'll get the same key, so we'll just increase the count:

key: 1010000010, value: 2.

--

  • Constructing a bit-set from a person's tools-list is O(T)
  • Searching if such key already exist in the map is O(log(P)∙T) (O(T) is the worst-case for comparing two strings of length T. It is probably much better since the keys are sorted. Also O(log(P) ignores the iterative construction).
  • Increasing the count is O(1), Alternatively - adding new key to the map is O(log(P)) (actually it is better because the map is constructed iteratively).

To summarize - you can construct the set for all persons in O(P∙log²(P)∙T). Again, you can do much better, but this is just to prove that it is polynomial.

Finding the key with the highest count is O(P) - walking over the map which contains P keys of less.

Lior Kogan
  • 19,919
  • 6
  • 53
  • 85
  • I agree with this for the informal problem description - this seems equivalent to getting everybody to write down on a piece of paper which tools they use and then sorting the pieces of paper. The formal description seems odd - T' looks like clique detection and there appear to be no constraints on P' at all. Hopefully Chris's informal description is the accurate version. – mcdowella Jun 27 '14 at 18:23
  • @mcdowella: As stated earlier, me informal description was wrong - sorry! – Chris Jun 27 '14 at 20:47
  • @LiorKogan: If I understood you correctly, you propose to iterate through all subsets of tools and look up all persons who use all of these tools. My problem is that both the total number of tools is large: |T|=150. Even assuming only a small set of tools like 6, you have binomial(150,6) = approx. 1.42 x 10^10 combinations to check which is not efficient at all. I wonder if a sampling approach is the only efficient way to approximate a solution. – Chris Jun 27 '14 at 20:54
  • @LiorKogan: Thank you very much for this great example. It really clarifies your idea. Unfortunately, the algorithm only allows to count "perfect matches" (not the graph-theoretical interpretation) of the tools used. Nevertheless, it can't find the maximum group of people which shares at least a subset of tools of size k. If Dan used a Claw Hammer, a Utility Knife but a Layout Square instead of a Sliding Bevel, it could still be in the same group as Dave for subsets of 2 tools. I updated my problem description above. Please apologize for not stating this precisely enough. – Chris Jun 28 '14 at 09:08
  • @LiorKogan: One could assign two people to the same group if some similarity of their respective bitstrings exceeds a certain threshold. Unfortunately, this does not guarantee that all share the same subset. Even more, a sufficiently low k/low threshold would allow for two or more persons to exist in the group without sharing a single tool. – Chris Jun 28 '14 at 09:19
  • This new formulation is solved by picking either the empty subset of tools (in case that's allowed) or else the single tool with the most users. @Chris : are you sure you know what the problem is? – Paul Hankin Jun 28 '14 at 10:14
  • I'm really sorry for misstating my problem earlier. I think the main problem was that my informal and formal descriptions didn't match. – Chris Jul 22 '14 at 14:50
1

Definitely not NP - Hard. I would suggest a greedy approach. Just find the tool with largest no. of people using it. Suppose the largest such group uses 2 tools A and B, the number can never be greater than max(the number of people using A or B ).

Soumadeep Saha
  • 337
  • 2
  • 14