Finding Largest Subset of Data where Average Matches Criteria

Question

I'm trying to find the largest subset sum of a particular data set, where the average of a field in the data set matches predetermined criteria.

For example, say I have a people's weights (example below) and my goal is to find the largest weight total where the average weight of the resulting group is between 200 and 201 pounds.

210
201
190
220
188

Using the above, the largest sum of weights where the average weight is between 200 and 201 pounds is from persons 1, 2, and 3. The sum of their weights is 601, and the average weight between them is 200.3.

Is there a way to program something to do the above, other than brute force, preferably using python? I'm not even sure where to start researching this so any help or guidance is appreciated.

score 1 · Accepted Answer · answered Feb 26 '19 at 22:08

Start by translating the desired range to 0, just for convenience. I'll translate to the lower bound, although the midpoint is also a good choice.

This makes your data set [10, 1, -10, 20, -12]. The set sum is 9; you need it to be in the range 0 to upper_bound * len(data).

This gives you a tractable variation of the "target sum" problem: find a subset of the list that satisfies the sum constraint. In this case, you have two solutions: [10, 1, -10] and [10, 1, -12]. You can find this by enhancing the customary target-sum problems to include the changing sum: the "remaining amount" will include the change from the mean calculation.

Can you finish from there?

thanks, this is closer to what I'm looking for. I think I'll be able to work with this. thank you! — Epausti, Feb 27 '19 at 14:41

score 0 · Answer 2 · answered Feb 26 '19 at 22:07

0

There are many ways to do this, but Pandas is your friend.

import pandas as pd

df = pd.DataFrame({'weight':[209, 203, 190, 220, 188, 193]})
df = df.rolling(3).mean()
df.query('200 <= weight <= 201').max()

In this case we create a dataframe from our weights. We then take a rolling average of every 3 weights. From this we get the max average between 200 and 201 lbs.

output:

weight    200.666667
dtype: float64

answered Feb 26 '19 at 22:07

Chris

15,819
3
24
37

1

This finds only subsequences, not subsets. Also, we're trying to determine the largest such subsequence, which would require iterating the `3` argument from the df size down to 1, until a solution is found. – Prune Feb 26 '19 at 22:09
1

in the question there is no restriction to 3 or to a window. if i understand it correctly, shuffling the input shall yield the same result – thejonny Feb 26 '19 at 22:10
Yes I read that wrong, leaving in case the use case is ever helpful :) – Chris Feb 26 '19 at 22:19
thanks for the response - i agree with the comments but I appreciate the reply. – Epausti Feb 27 '19 at 14:41

Finding Largest Subset of Data where Average Matches Criteria

2 Answers2