Calculate weighted average with Pandas for decreasing cost

Question

I am installing a ranking system and basically I have a field called site_fees that accounts for 10% of the total for consideration. A site fee of 0 would get all 10 points. What I want to do is calculate how many points the non-zero fields would get, but I am struggling to do so.

My initial approach was to split the dataframe into 2 dataframes (dfb where site_fees are 0 and dfa where they are > 0) and calculate the average for dfa, assign the rating for dfb as 10, then union the two.

The code is as follows:

dfSitesa = dfSites[dfSites['site_fees'].notnull()]
dfSitesb = dfSites[dfSites['site_fees'].isnull()]
dfSitesa['rating'] = FeeWeight * \
dfSitesa['site_fees'].min()/dfSitesa['site_fees']
dfSitesb['rating'] = FeeWeight
dfSites = pd.concat([dfSitesa,dfSitesb])

This produces an output, however the results of dfa are not correct because the minimum of dfa is 5000 instead of 0, so the rating of a site with $5000 in fees is 10 (the maximum, not correct). What am I doing wrong?

The minimum non-zero site_fee is 5000 and the maximum is 15000. Based on this, I would expect a general ranking system like:

15000 | 0
10000 | 3.3
5000  | 6.6
0     | 10

Can you include a [MCVE] with expected output? That will make sure people give you answers that do what you want. — ASGM, May 15 '19 at 13:24
This didn't work. The percentage of a site fee of $5000 is 0.000202. The inverse of that is 4958.26, multiplying by fee weight is 49582 — DBA108642, May 15 '19 at 13:25

score 1 · Accepted Answer · answered May 15 '19 at 13:11

1

Here is a way to do it :

dfSites = pd.DataFrame({'site_fees':[0,1,2,3,5]})
FeeWeight = 10

dfSitesa = dfSites[dfSites['site_fees'].notnull()]
dfSitesb = dfSites[dfSites['site_fees'].isnull()]
dfSitesb['rating'] = FeeWeight
factor = (dfSitesa['site_fees'].max() - dfSitesa['site_fees'].min())
dfSitesa['rating'] = FeeWeight * ( 1 - ( (dfSitesa['site_fees'] -  dfSitesa['site_fees'].min()) / factor) )
dfSites = pd.concat([dfSitesa,dfSitesb])

In [1] : print(dfSites)
Out[1] : 
site_fees   rating
0   0   10.0
1   1   8.0
2   2   6.0
3   3   4.0
4   5   0.0

answered May 15 '19 at 13:11

vlemaistre

3,301
13
30

So i tried this, and I'm still getting a rating of 10 for a site_fee of 5000. Since 5000 is the minimum non-zero site fee and 15000 is the maximum, I would expect a site fee of about 3.3 – DBA108642 May 15 '19 at 13:19
Disregard my comment. This worked, I just had to replace NaN with 0's – DBA108642 May 15 '19 at 13:39

Calculate weighted average with Pandas for decreasing cost

1 Answers1