-4

We are writing c# program that will help us to remove some of unnecessary data repeaters and already found some repeaters to remove with help of this Finding overlapping data in arrays. Now we are going to check maybe we can to cancel some repeaters by other term. The question is:

We have arrays of numbers

{1, 2, 3, 4, 5, 6, 7, ...}, {4, 5, 10, 100}, {100, 1, 20, 50}

some numbers can be repeated in other arrays, some numbers can be unique and to belong only to specific array. We want to remove some arrays when we are ready to lose up to N numbers from the arrays.

Explanation:

  1. {1, 2}

  2. {2, 3, 4, 5}

  3. {2, 7}

We are ready to lose up to 3 numbers from these arrays it means that we can remove array 1 cause we will lose only number "1" it's only unique number. Also we can remove array 1 and 3 cause we will lose numbers "1", "7" or array 3 cause we will lose number "7" only and it less than 3 numbers.

In our output we want to give maximum amount of arrays that can be removed with condition that we going to lose less then N where N is number of items we are ready to lose.

Community
  • 1
  • 1
genichm
  • 525
  • 7
  • 18
  • When you say that you are ready to lose three numbers, do you mean that you can select any three numbers? Have you given any thought to how you would go about doing this? – Jim Mischel Dec 11 '14 at 15:51
  • @Jim Mischel No, I say that I am ready to loose up to 3 numbers no matter which of them. I can not find any efficient solution to these problem. – genichm Dec 11 '14 at 15:54
  • But can you find *any* solution to the problem? Very often, finding even an inefficient solution can help you discover a more efficient way. – Jim Mischel Dec 11 '14 at 15:59
  • @Jim Mischel one of the ideas is to take shortest arrays with highest number of overlaps and simply to start to remove them in different combinations with checking how many numbers removed after each iteration. We can do it parallel in big amounts. – genichm Dec 11 '14 at 16:30
  • Do you actually need all of the combinations or is there some selection criterion that you're planning to apply afterward? If it's the latter, then it's almost certainly a good idea to find an algorithm that takes this criterion into account in the first place. – David Eisenstat Dec 14 '14 at 12:47
  • @DavidEisenstat Output will be combinations with most arrays no matter how far or close they to destination N. Where N is number of items we ready to lose from our arrays. So I think we need all combinations. – genichm Dec 14 '14 at 13:33
  • Do the arrays contain unique numbers? That is, can one array contain `{2, 7, 7, 9, 15, 18, 18, 21}`? If so, does removing the number 7 mean removing all copies of the number 7 from the array? In general, I find your problem description incomplete, and your additional explanations confusing. I still don't know what you're really asking for. – Jim Mischel Dec 15 '14 at 16:25
  • @Jim Mischel number can not be repeated inside of same array but can be included in several different arrays. {1,2},{2,3} number 2 exists in first and also in second array. The question is, I want to remove arrays and ready to lose up to 3 numbers (for example) in this example numbers 1,2,3 will exists only in arrays that presented above, it mean that if we will remove these arrays we will lose numbers 1,2, 3 from complete collection of all numbers. – genichm Dec 15 '14 at 19:49
  • Whats the expected number of arrays? whats the expected max value for the numbers inside any array? Can I assume infinite memory since its an algorithm question? And whats the big O requirement ? – Steve Dec 16 '14 at 21:46
  • @Steve We do not have any expected number of arrays and max value inside any array. The memory can not be infinite cause we have built graph of all numbers with all linked numbers and combinations and it finished all the memory and also not O requirements exists. We already have working algorithm but if somebody will give right answer even if it worst than our one I will give 100 points of my reputation as promised. – genichm Dec 17 '14 at 21:59

2 Answers2

1

This problem is equivalent to the Set Cover problem (e.g.: take N=0) and thus efficient, exact solutions that work in general are unlikely. However, in practice, heuristics and approximations are often good enough. Given the similarity of your problem with Set Cover, the greedy heuristic is a natural starting point. Instead of stopping when you've covered all elements, stop when you've covered all but N elements.

mhum
  • 2,928
  • 1
  • 16
  • 11
0

You need to first get a number for each array which tells you hwo many numbers are unique to that particular array.
An easy way to do this is O(n²) since for each element, you need to check through all arrays if it's unique.
You can do this much more efficiently by having sorted arrays, sorting first or using a heap-like data structure.

After that, you only have to find a sum so that the numbers for a certain amount of arrays sum up to N.
That's similar to the subset sum problem, but much less complex because N > 0 and all your numbers are > 0.
So you simply have to sort these numbers from smallest to greatest and then iterate over the sorted array and take the numbers as long as the sum < N.
Finally, you can remove every array that corresponds to a number which you were able to fit into N.

Matt Ko
  • 969
  • 7
  • 14
  • This solution has some problem, for example I have many arrays and 2 of them are {1, 2}, {2, 3}, ... - each array has 1 unique number but if to remove them will be removed 3 numbers and not 2 because they are has common number 2 that belongs to these arrays only. In our case shortest array has 93 numbers. In our output we want to give maximum amount of arrays that can be removed. – genichm Dec 15 '14 at 07:27
  • @genichm: I don't understand. If your shortest array contains 93 numbers, then you can't remove *any* array by losing only 3 numbers. Assuming that all the numbers in the array are unique. – Jim Mischel Dec 15 '14 at 16:24
  • @Jim Mischel No, as I mentioned in the question not all numbers are unique. 3 numbers it is only example, it can be 20 numbers or 10 or 50 dependet on input conditions. – genichm Dec 15 '14 at 19:35