2

Here's some background of my problem(its not homework). I have a 4 items to choose from at a grocery store and I want to figure out what percentage of each I can fit in my basket but I set a range that no item can be less than 20% and no more than 60%. The program currently starts as 0 and works it way up(0 is the min value in this case 20, and anything above it equals how many percent over the min value it is). Using the above example of a min of 20, then [20,0,0,0] would equal 40 + 20 + 20 + 20 = 100 and generates permutations for all the items.

After running it, I realized(using the above example) the first item is [20,0,0,0] and the last item is [0,0,0,20]. I tested it by comparing all the results against reversed versions of themselves and they all exist so I thought I could find a way to cut my processing time in half by only processing half the list then taking the results and flipping them around. I ran into a problem as I started checking for reversed matches while the program was still generating results, it seems there is not a single point where it flips over and starts duplicating(I was hoping exactly half way through it would do it). Here is the output of the results, the first column is the result itself and the second is the indexOf the reversed version of it(-1 means its not found). It starts off as all -1 and then starts to find a few reversed matches but still has many -1 then it eventually transitions over to having all repeated results but it seem to do it in a predictable clear cut way.

My end goal is to use larger lists and I'm trying to figure a way to generate the most data possible so any suggestions on how to identify the pattern or other speed improvements would be great. I'm thinking in this case I either need to have a method that identifies the pattern once its completely developed(i.e. no more chance of -1) or tweak the algorithm so its generating in an order that switches over fully instead of a slow partial change.

if it helps, here's the code I'm using(note: the number of items and ranges are variables so I'm trying to find a general pattern and not specific to any hard numbers):

import java.util.*;

public class Distributor {
    //correct output is 1771
    private ArrayList<int[]> result =  new ArrayList <int[]> ();
    /*

    */
    public Distributor (final String [] names, int [] low, int [] high) 
    {
        final int rest = 100;
        int minimum = 0;
            for (int l : low)
                minimum += l; 
        int [] sizes = new int [names.length];
        distribute (0, low, high, rest - minimum, sizes);
        System.out.println ("total size of results are " + result.size ());
    }

    public static void main (String [] args) {
        System.out.println("starting..Hold on!!");
        final String [] names = new String [] {"a", "b", "c", "d"}; //name of items
        int [] low = new int [names.length];
        int [] high = new int [names.length];
        Arrays.fill(low, 20); //auto fill the range of items with a min/max range
        Arrays.fill(high, 60);
        new Distributor (names, low, high);
    }

    /*
        distribute the rest of values over the elements in sizes, beginning with index i.           
    */
    void distribute (int i, int [] low, int [] high, final int rest, int [] sizes) {  
        if (i == sizes.length - 1) { //this area procesess the final item and adds it to the solution queue
            if (rest < high [i]) {
                sizes[i] = rest; 
                checkResultsDuringRuntime(Arrays.copyOf (sizes, sizes.length),result);
                result.add (Arrays.copyOf (sizes, sizes.length));
            }
        }
        else {
            int StartLoop = 0;
            //this area checks if the remaining value can be reached used the max values of remaining items
            //if its not possible then the current min range of the loop must be increased
            if ( rest > (low.length-(i + 1))*high[i]) {
                System.out.println("rest is higher than high");
                StartLoop = (rest - (low.length-(i + 1))*high[i]);
            }
            //this area runs a loop for each coeffient and then sends it back to this function to further processing
            for (int c = StartLoop; 
                c <= java.lang.Math.min (high [i] - low [i], rest); 
                ++c) {  
                sizes [i] = c;
                    distribute (i + 1, low, high, rest - c, sizes);                 
            }
        }
    }

    private static void checkResultsDuringRuntime(int[] defaultlist, ArrayList<int[]> result2) {
        //check results list for list
        //first lets flip the list around
        int[] ReversedList = new int[defaultlist.length];
        for (int x = defaultlist.length-1, y=0; x>=0;x--, y++) {
            ReversedList[y] = defaultlist[x];
        }       
        int MatchLocation = -1;
        for (int[] item : result2) {
            if ( Arrays.toString(item).equals(Arrays.toString(ReversedList)))
            {   
                //System.out.println("theres a match");
                MatchLocation = result2.indexOf(item);
            }
        }
        System.out.println(Arrays.toString(defaultlist) + " = " + MatchLocation);
    }
}

output: http://pastebin.com/6vXRvVit

Edit: The program is not generating duplicates. Its generating premuations that seem to reverse themselves eventually. I want to try to capture the point where the reversed permutation would all match existing results so instead of further processing the data I can just reverse existing results. Check out the output above for what I'm describing.

Lostsoul
  • 25,013
  • 48
  • 144
  • 239
  • kinda odd that [the same question](http://stackoverflow.com/questions/10743499/creating-a-list-of-all-possible-percentages-of-items/) was asked within a day of each other, of course thats just coincidence, and neither are really homework right? lol – goat May 26 '12 at 23:48
  • @chris we're actually both working on solving the same problem together. I just decided to post the question because I want to improve this functionality. You helped us get started but this question is not the same, your question was how to get started this is how to improve performance by reducing extra work. – Lostsoul May 26 '12 at 23:59

2 Answers2

1

I don't fully understand the question, but there's a trick to solving similar problems. Say you want to generate 3 numbers from 1 to 6, [1, 4, 2], for example, but you want to disregard dups; that is [1,2,3] = [3,2,1]

Here's how:

for(int i=1; i <= 6; i++) {
    for(int j=i+1; j <= 6; j++) {
        for(int k=j+1; k <= 6; k++) {
            System.out.println(i+","+j+","+k);
            }
        }
    }

The output will include all possibilities, but no duplicate permutations.

edit - here's the output...

1,2,3
1,2,4
1,2,5
1,2,6
1,3,4
1,3,5
1,3,6
1,4,5
1,4,6
1,5,6
2,3,4
2,3,5
2,3,6
2,4,5
2,4,6
2,5,6
3,4,5
3,4,6
3,5,6
4,5,6

EDIT 2 - For the OP's problem where there are 4 items with the limits of 20 and 60, there are 101,270 permutations. That assumes integer percentages are acceptable. That is, 25%, not 25.1%

EDIT 3 - yah, this one is fun. The OP said that the percentages had to add up to 100. I missed that. There are 108 possibilities. They are:

1 : [20,20,20,40]
2 : [20,20,21,39]
3 : [20,20,22,38]
4 : [20,20,23,37]
5 : [20,20,24,36]
6 : [20,20,25,35]
7 : [20,20,26,34]
8 : [20,20,27,33]
9 : [20,20,28,32]
10 : [20,20,29,31]
11 : [20,20,30,30]
12 : [20,21,21,38]
13 : [20,21,22,37]
14 : [20,21,23,36]
15 : [20,21,24,35]
16 : [20,21,25,34]
17 : [20,21,26,33]
18 : [20,21,27,32]
19 : [20,21,28,31]
20 : [20,21,29,30]
21 : [20,22,22,36]
22 : [20,22,23,35]
23 : [20,22,24,34]
24 : [20,22,25,33]
25 : [20,22,26,32]
26 : [20,22,27,31]
27 : [20,22,28,30]
28 : [20,22,29,29]
29 : [20,23,23,34]
30 : [20,23,24,33]
31 : [20,23,25,32]
32 : [20,23,26,31]
33 : [20,23,27,30]
34 : [20,23,28,29]
35 : [20,24,24,32]
36 : [20,24,25,31]
37 : [20,24,26,30]
38 : [20,24,27,29]
39 : [20,24,28,28]
40 : [20,25,25,30]
41 : [20,25,26,29]
42 : [20,25,27,28]
43 : [20,26,26,28]
44 : [20,26,27,27]
45 : [21,21,21,37]
46 : [21,21,22,36]
47 : [21,21,23,35]
48 : [21,21,24,34]
49 : [21,21,25,33]
50 : [21,21,26,32]
51 : [21,21,27,31]
52 : [21,21,28,30]
53 : [21,21,29,29]
54 : [21,22,22,35]
55 : [21,22,23,34]
56 : [21,22,24,33]
57 : [21,22,25,32]
58 : [21,22,26,31]
59 : [21,22,27,30]
60 : [21,22,28,29]
61 : [21,23,23,33]
62 : [21,23,24,32]
63 : [21,23,25,31]
64 : [21,23,26,30]
65 : [21,23,27,29]
66 : [21,23,28,28]
67 : [21,24,24,31]
68 : [21,24,25,30]
69 : [21,24,26,29]
70 : [21,24,27,28]
71 : [21,25,25,29]
72 : [21,25,26,28]
73 : [21,25,27,27]
74 : [21,26,26,27]
75 : [22,22,22,34]
76 : [22,22,23,33]
77 : [22,22,24,32]
78 : [22,22,25,31]
79 : [22,22,26,30]
80 : [22,22,27,29]
81 : [22,22,28,28]
82 : [22,23,23,32]
83 : [22,23,24,31]
84 : [22,23,25,30]
85 : [22,23,26,29]
86 : [22,23,27,28]
87 : [22,24,24,30]
88 : [22,24,25,29]
89 : [22,24,26,28]
90 : [22,24,27,27]
91 : [22,25,25,28]
92 : [22,25,26,27]
93 : [22,26,26,26]
94 : [23,23,23,31]
95 : [23,23,24,30]
96 : [23,23,25,29]
97 : [23,23,26,28]
98 : [23,23,27,27]
99 : [23,24,24,29]
100 : [23,24,25,28]
101 : [23,24,26,27]
102 : [23,25,25,27]
103 : [23,25,26,26]
104 : [24,24,24,28]
105 : [24,24,25,27]
106 : [24,24,26,26]
107 : [24,25,25,26]
108 : [25,25,25,25]

We notice that the upper limit of 60 is too large - three other items can't be added at 20%, the minimum value. That would be 120%.

Tony Ennis
  • 12,000
  • 7
  • 52
  • 73
  • I think the time complexity on that will destroy it for large values of N – Woot4Moo May 26 '12 at 23:40
  • The technique avoids all unnecessary sorting and checking; it produces exactly the minimal set of inputs and cannot produce the dups the OP is trying to avoid. Now, if OP's percentages aren't limited to 'fixed' percentages, that is, a valid percent can only be represented by a float or double, then this technique doesn't work. – Tony Ennis May 26 '12 at 23:43
  • you have a 3N solution. whereas the bloom filter is N. Unless I am missing something. – Woot4Moo May 26 '12 at 23:52
  • Tony I think the results is 1771 using the above example, but Woot4Moo is correct there will be a larger list. – Lostsoul May 26 '12 at 23:56
  • I still got nothing close to that. I'm at 108. Which constraint am I imposing that I should not be? – Tony Ennis May 27 '12 at 00:00
  • @Woot4Moo I'll have to read up on the Bloom filter. Thanks for posting about it. I can't believe it is quicker than simply not generating unwanted dups to begin with, however. – Tony Ennis May 27 '12 at 00:02
  • @TonyEnnis I believe the space would be less (obviously by not generating everything) but the time would grow out of control. – Woot4Moo May 27 '12 at 00:04
  • That would only be the case if the percentages were represented as floats. If they are integers (perhaps _discrete_ is a better term), then adding more items really doesn't increase the time. Yes, the loops are nested (so in the toy example it is indeed O(n^4) but the K is tiny. Consider, if there are 5 items, there is 1 solution. If there are 6 items instead of 4, there are *no* solutions to the OP's problem. – Tony Ennis May 27 '12 at 00:10
  • I just read up on the Bloom filter. It won't help here. The filter will look at a set of numbers and decide if there are dups. But where will the data to be tested come from? The only way to get all the permutations is a loop similar to what's coded above. What I posted generates the sets you'd test, except due to the limitations of this problem, there won't be any dups to begin with. – Tony Ennis May 27 '12 at 00:16
  • Ah, optimization... it's O(n^(k-1)) where k is the number of items. – Tony Ennis May 27 '12 at 00:19
  • I think if you look at the output you can see all the 1771 results just add +20(the min, to each item) and it should be shown. I think you maybe looking at combinations and not premuations. – Lostsoul May 27 '12 at 00:21
  • Yah, your lines 12 and 167 are specifically disallowed by the technique I posted. – Tony Ennis May 27 '12 at 00:33
  • So, you want the duplicates, you just don't want to have to generate them? *if* the percentages values are discrete, and *if* they have to sum to 100, then it's pretty easy and fast to generate them. What are the real-life values you care about? How many items? What are the min/max constraints per item? – Tony Ennis May 27 '12 at 00:40
  • yes your right(if the numbers ultimately repeat themselves why waste cpu time generating all of them). the numbers are discrete but don't always sum up to 100(sometimes could be less)..honestly I'm trying to get this as quickly as possible so I can play around with a range that works. Ideally, I want 30-40 items with ranges within 5% of each other(so 1-5 or 5-10 or something) but if its not possible I want see how high I can get at what range. – Lostsoul May 27 '12 at 00:47
  • Generate the smaller list as above. It has no dups. Then _permute your items_ and apply them to the list. Since I don't know what you're trying to accomplish I don't know where to go next, but it will be a near-optimal solution, if you want to produce every possible combination and percentage of items. And it's pretty easy to code up. I'd have to think about the 'could be less than 100%' thing, however - that's a key factor. – Tony Ennis May 27 '12 at 00:52
  • I'll try that..sorry could be less was just for my own flexibility..I like putting rules in variables so I can tweak them, I don't think I would have a reason to tweak it just like the flexibility to do so. If its problem I can forfeit it..I think I found a pattern, if the last digit is less than a existing first digit its unique, but if the first digit is more than any previous last digits then it doesn't work, this goes on until half the items are used..I don't know for sure but tweaking it. – Lostsoul May 27 '12 at 00:56
0

I recommend using the bloom filter which guarantees that false negatives are not possible:

False positives are possible, but false negatives are not; i.e. a query returns either "inside set (may be wrong)" or "definitely not in set"

A Java implementation

Source code

Now of course the issue is comparing Arrays for equality, which I would recommend doing the following:

Arrays.sort(arrayToCompare);  
Arrays.equals(initialArray,arrayToCompare);

You take the overhead of the sort, but it will tell you if two arrays are equal and thus the bloom filter from before will be sufficient.

Community
  • 1
  • 1
Woot4Moo
  • 23,987
  • 16
  • 94
  • 151
  • I'm not sure this is what I'm asking. I can detect when the reversed version of an item exists in the results list, I'm trying to understand a pattern so I can know when to stop processing the list and just reverse all the results I have. – Lostsoul May 26 '12 at 23:57
  • I do not want detect this after results have all been generated, I want to do this during the generation process so I know when to stop and reverse around existing results I have. – Lostsoul May 27 '12 at 00:03
  • @Error_404 Ah ok, let me look into a solution for this. – Woot4Moo May 27 '12 at 00:05
  • @Error_404 wait, no the bloom filter would still apply, because if you check after each step of generation it would work. You should be buffering the results over as each is produced. When you find one that is not in the set, which is guaranteed to always be true. You can stop. – Woot4Moo May 27 '12 at 00:07
  • I'll play around with that idea right now and let you know. I'm not 100% sure though, if you look the results link I included, the problem is it doesn't go from no results to all of sudden results being found(which is the problem I face). It seems to not find any results at first then finds a few results then doesn't find any results and continues this pattern until all new results match reversed existing results. I just need a way of identifying when that happens, just because there is a match does not mean it will continue without some non-matches. – Lostsoul May 27 '12 at 00:11
  • @Error_404 ok I will fire this up into eclipse and see what I can do. – Woot4Moo May 27 '12 at 00:16