Weighted Means for columns in Pandas DataFrame including Nan

Question

I am trying to get the weighted mean for each column (A-F) of a Pandas.Dataframe with "Value" as the weight. I can only find solutions for problems with categories, which is not what I need.

The comparable solution for normal means would be

df.means()

Notice the df has Nan in the columns and "Value".

   A      B         C      D      E      F    Value
0  17656  61496     83     80    117     99   2902804
1  75078  61179     14      3      6     14   3761964
2  21316  60648     86    Nan    107     93   127963
3  6422   48468  28855  26838  27319  27011   131354
4  12378  42973  47153  46062  46634  42689   3303909572
5  54292  35896     59      6      3     18   27666367
6  21272  Nan      126     12      3      5   9618047
7  26434  35787    113     17      4      8   309943
8  10508  34314  34197   7100     10     10   NaN

I can use this for a single column.

df1 = df[['A','Value']]
df1 = df1.dropna()
np.average(df1['A'], weights=df1['Value'])

There must be a simple method. It's driving me nuts I don't see it.

I would appreciate any help.

score 2 · Accepted Answer · answered Aug 17 '17 at 19:08

2

You could use masked arrays. We could dropoff rows where Value column has NaN values.

In [353]: dff = df.dropna(subset=['Value'])

In [354]: dff.apply(lambda x: np.ma.average(
                    np.ma.MaskedArray(x, mask=np.isnan(x)), weights=dff.Value))
Out[354]:
A        1.282629e+04
B        4.295120e+04
C        4.652817e+04
D        4.545254e+04
E        4.601520e+04
F        4.212276e+04
Value    3.260246e+09
dtype: float64

answered Aug 17 '17 at 19:08

Zero

74,117
18
147
154

That worked great, I have never used MaskedArray before. – TChi Aug 17 '17 at 20:29
I'm curious. How would you solve this if there was no missing data? – TChi Aug 17 '17 at 20:31
If there is no missing data you could directly do `df.apply(lambda x: np.average(x, weights=df.Value))`? – Zero Aug 18 '17 at 07:31
Thank you, Again! – TChi Aug 18 '17 at 20:29

Weighted Means for columns in Pandas DataFrame including Nan

1 Answers1