Python: Row-Based Mean for Subset of Columns

Question

I am currently a beginning programmer working on a project to write and deploy a specific behavioral latency calculation online such that the app can accommodate any dataframe uploaded via CSV. The analysis requires that I take the mean of N number of columns within-person(i.e., within the same row) and iterate across all the respondents.

How can I take the specific instance where I perform this calculation, and turn it into a general function such that they apply more generally to any dataframe uploaded. For example, how can the row-based mean calculation below be written to work on N number of attributes/columns:

data['PersonalAverage'] = (data[[2]] + data[[4]] + data[[6]] + data[[8]] + data[[10]] + data[[12]] + data[[14]] + data[[16]] + data[[18]] + data[[20]] + data[[22]] + data[[24]] + data[[26]] + data[[28]] + data[[30]] + data[[32]] + data[[34]])/17

What if I uploaded a CSV with only 5 attributes (instead of 17).

Can anyone point me in the right direction?

Any reason for not using numpy? It comes with a mean function and functionality to select specific rows. — Markus M., Apr 20 '16 at 03:54

Sagar Waghmode · Answer 1 · 2016-04-20T04:52:16.607

You can use data.mean for such purpose.

In [3]: df
Out[3]: 
   a  b  c  d
0  1  2  3  4
1  4  5  6  7
2  2  4  6  8
3  3  2  1  4
4  0  1  2  4

In [4]: cols = ['a', 'b', 'd']    # Columns to consider for average

In [5]: df['mean'] = df[cols].mean(axis=1)

In [7]: df
Out[7]: 
   a  b  c  d      mean
0  1  2  3  4  2.333333
1  4  5  6  7  5.333333
2  2  4  6  8  4.666667
3  3  2  1  4  3.000000
4  0  1  2  4  1.666667

score 0 · Answer 2 · answered Sep 22 '20 at 17:21

0

df['mean'] = df.mean(axis=1)

this should do the trick, however, if you have say an object type description in the first column you can skip that and calculate for the mean simply by using

df['mean'] = df.iloc[:,1:].mean(axis=1)

answered Sep 22 '20 at 17:21

Rich

9
6

Python: Row-Based Mean for Subset of Columns

2 Answers2