2

I am attempting to find the cartesian product and append specific criteria.

I have four pools of 25 people each. Each person has a score and a price. Each person in each pool looks as such.

[0] => array(
    "name" => "jacob",
    "price" => 15,
    "score" => 100
),
[1] => array(
    "name" => "daniel",
    "price" => 22,
    "score" => 200
)

I want to find the best combination of people, with one person being picked from each pool. However, there is a ceiling price where no grouping can exceed a certain price.

I have been messing with cartesians and permutation functions and cannot seem to figure out how to do this. The only way I know how to code it is to have nested foreach loops, but that is incredibly taxing.

This code below, as you can see, is incredibly inefficient. Especially if the pools increase!

foreach($poolA as $vA) {
   foreach($poolb as $vB) {
       foreach($poolC as $vC) {
            foreach($poolD as $vD) {

                // calculate total price and check if valid
                // calculate total score and check if greatest
                // if so, add to $greatest array

            }
        }
    }    
}      

I also thought I could find a way to calculate the total price/score ratio and use that to my advantage, but I don't know what I'm missing.

Jacob Raccuia
  • 1,666
  • 1
  • 16
  • 25
  • You can probably make an improvement to the algorithm by sorting the people in each pool by price. When you reach the one whose price puts you over the limit, you don't have to try the rest in that pool. – Barmar Nov 17 '16 at 04:34
  • @Barmar that is genius! that should definitely cut down on many loops. Thank you. – Jacob Raccuia Nov 17 '16 at 04:35
  • Your brute force approach has nothing to do with permutation (and shouldn't). – user unknown Nov 18 '16 at 19:24

4 Answers4

2

As pointed out by Barmar, sorting the people in each pool allows you to halt the loops early when the total price exceeds the limit and hence reduces the number of cases you need to check. However, the asymptotic complexity for applying this improvement is still O(n4) (where n is the number of people in a pool).

I will outline an alternative approach with better asymptotic complexity as follow:

  1. Construct a pool X that contains all pairs of people with one from pool A and the other from pool B.
  2. Construct a pool Y that contains all pairs of people with one from pool C and the other from pool D.
  3. Sort the pairs in pool X by total price. Then for any pairs with the same price, retain the one with the highest score and discard the remaining pairs.
  4. Sort the pairs in pool Y by total price. Then for any pairs with the same price, retain the one with the highest score and discard the remaining pairs.
  5. Do a loop with two pointers to check over all possible combinations that satisfy the price constraint, where the head pointer starts at the first item in pool X, and the tail pointer starts at the last item in pool Y. Sample code is given below to illustrate how this loop works:

==========================================================================

$head = 0;
$tail = sizeof($poolY) - 1;

while ($head < sizeof($poolX) && $tail >= 0) {
    $total_price = $poolX[$head].price + $poolY[$tail].price;

    // Your logic goes here...

    if ($total_price > $price_limit) {
        $tail--;
    } else if ($total_price < $price_limit) {
        $head++;
    } else {
        $head++;
        $tail--;
    }
}

for ($i = $head; $i < sizeof($poolX); $i++) {
    // Your logic goes here...
}

for ($i = $tail; $i >= 0; $i--) {
    // Your logic goes here...
}

==========================================================================

The complexity of steps 1 and 2 are O(n2), and the complexity of steps 3 and 4 can be done in O(n2 log(n)) using balanced binary tree. And step 5 is essentially a linear scan over n2 items, so the complexity is also O(n2). Therefore the overall complexity of this approach is O(n2 log(n)).

Community
  • 1
  • 1
chiwangc
  • 3,566
  • 16
  • 26
  • 32
  • This is a cool solution. A little unrelated, but what's the best way to find the combination of two lists? Is it still a foreach in a foreach? When I needed to do it for one list, I had split the list in half and looped through both. Also, what do the last two for loops do? – Jacob Raccuia Nov 17 '16 at 14:37
  • @JacobRaccuia (1) The are in total `mn` pairs for lists of size `m` and `n`, so it requires `Ω(mn)` time (to print the output), so two nested for loops is already the best you can do. (2) Note that it is possible that either `$head` does not reach the end of `$poolX` or `$tail` does not reach the start of `$poolY`, so the last two for loops are to ensure they do scan through all possible cases. – chiwangc Nov 17 '16 at 15:46
  • I see. That's because `$head` and `$tail` can be different lengths. If they are the same length, then they won't trigger? – Jacob Raccuia Nov 17 '16 at 17:28
  • @JacobRaccuia The pointers of the `while` loop get updated base on the value of `$total_price` in comparison to `$price_limit`, so it is possible that `$head` reaches the end of `$poolX` while `$tail` is still hanging in the middle of `$poolY` or vice versa, so this does not directly related to the lengths of `$poolX` and `$poolY`. – chiwangc Nov 18 '16 at 04:12
  • I'm not gonna lie, I still don't understand this. I put my data in, but I don't know how to find out which `$poolX` and `$poolY` will yield the best combination. – Jacob Raccuia Nov 18 '16 at 19:25
  • It is not necessary for you to explicitly pick `$poolA` etc. to find the best combination. You can simply pick any two pools to form `$poolX` and the others to form `$poolY`, this will still give you the asymptotic guarantee. In fact, you will want this code to work with arbitrary data set, so it is not possible for you to split the data to find the best grouping in general. – chiwangc Nov 19 '16 at 00:25
  • @JacobRaccuia If you still have problem with understanding this approach, do let me know what are you puzzling with, I will update my solution accordingly to help you to get a better understanding. – chiwangc Nov 19 '16 at 00:27
0

A couple of things to note about your approach here. Speaking strictly from a mathematics perspective, you're calculating way more permutations than is actually necessary to arrive at a definitive answer.

In combinatorics, there are two important questions to ask in order to arrive at the exact number of permutations necessary to yield all possible combinations.

  1. Does order matter? (for your case, it does not)
  2. Is repetition allowed? (for your case, it is not necessary to repeat)

Since the answer to both of these question is no, you need only a fraction of the iterations you're currently doing with your nested loop. Currently you are doing, pow(25, 4) permutations, which is 390625. You only actually need n! / r! (n-r)! or gmp_fact(25) / (gmp_fact(4) * gmp_fact(25 - 4)) which is only 12650 total permutations needed.

Here's a simple example of a function that produces combinations without repetition (and where order does not matter), using a generator in PHP (taken from this SO answer).

function comb($m, $a) {
    if (!$m) {
        yield [];
        return;
    }
    if (!$a) {
        return;
    }
    $h = $a[0];
    $t = array_slice($a, 1);
    foreach(comb($m - 1, $t) as $c)
        yield array_merge([$h], $c);
    foreach(comb($m, $t) as $c)
        yield $c;
}

$a = range(1,25); // 25 people in each pool
$n = 4; // 4 pools

foreach(comb($n, $a) as $i => $c) {
    echo $i, ": ", array_sum($c), "\n";
}

It would be pretty easy to modify the generator function to check whether the sum of prices meets/exceeds the desired threshhold and only return valid results from there (i.e. abandoning early where needed).

The reason repetition and order are not important here for your use case, is because it doesn't matter whether you add $price1 + $price2 or $price2 + $price1, the result will undoubtedly be the same in both permutations. So you only need to add up each unique set once to ascertain all possible sums.

Community
  • 1
  • 1
Sherif
  • 11,786
  • 3
  • 32
  • 57
  • Thank you. I will look through this. In my specific case, there are a different number of people in each pool. How do I account for that? I am also a little confused as to how to use this function. Where do I put my four arrays that hold the data? ( would combining the data into one array be easier? ) – Jacob Raccuia Nov 17 '16 at 05:38
  • It doesn't matter how many people you have in a pool. The solution doesn't account for making selections of each set, but rather uniquely combining all available selections from a set to a given size (i.e. `$m`). If you want to combine all members of the set into combinations of 4, this would do it such that no member is repeated. If you require a unique filtering criteria of abstaining from in-set combinatorics you can impose such a filter from within the generator function itself, although I find it may be unnecessary. – Sherif Nov 17 '16 at 06:12
  • I still don't understand where my arrays are in your example. How do I call them in the function? – Jacob Raccuia Nov 17 '16 at 06:25
0

Similar to chiwangs solutions, you may eliminate up front every group member, where another group member in that group exists, with same or higher score for a lower price. Maybe you can eliminate many members in each group with this approach.

You may then either use this technique, to build two pairs and repeat the filtering (eliminate pairs, where anothr pair exists, with higher score for the same or lower costs) and then combine the pairs the same way, or add a member step by step (one pair, a triple, a quartett).

If there exists some member, who exceed the allowed sum price on their own, they can be eliminated up front.

If you order the 4 groups by score descending, and you find a solution abcd, where the sum price is legal, you found the optimal solution for a given set of abc.

user unknown
  • 35,537
  • 11
  • 75
  • 121
0

The reponses here helped me figure out the best way for me to do this.

I haven't optimized the function yet, but essentially I looped through each results two at a time to find the combined salaries / scores for each combination in the two pools.

I stored the combined salary -> score combination in a new array, and if the salary already existed, I'd compare scores and remove the lower one.

$results = array();
foreach($poolA as $A) {
    foreach($poolB as $B) {
        $total_salary = $A['Salary'] + $B['Salary'];
        $total_score =  $A['Score'] + $B['Score'];
        $pids = array($A['pid'], $B['pid']);

        if(isset($results[$total_salary]) {
             if($total_score > $results[$total_salary]['Score']) {
                 $results[$total_salary]['Score'] => $total_score;
                 $results[$total_salary]['pid'] => $pids; 
        } else {
            $results[$total_salary]['Score'] = $total_score;
            $results[$total_salary]['pid'] = $pids;
        }
    }         
}

After this loop, I have another one that is identical, except my foreach loops are between $results and $poolC.

foreach($results as $R) {
    foreach($poolC as $C) {

and finally, I do it one last time for $poolD.

I am working on optimizing the code by putting all four foreach loops into one.

Thank you everyone for your help, I was able to loop through 9 lists with 25+ people in each and find the best result in an incredibly quick processing time!

Jacob Raccuia
  • 1,666
  • 1
  • 16
  • 25