Python - To create 2 new column with 25th and 75th percentile of several row values

Question

Here is how my df kind of looks like (with many more rows, and many more columns):

Index	WTG1	WTG2	WTG3
1.5	61.25	-7.57	7.18
2	19.69	25.95	28.67
2.5	59.51	81.22	78.22
3	131.81	154.07	142.92

My objective is to get:

Index	WTG1	WTG2	WTG3	25th Percentile	75th Percentile	Mean
1.5	61.25	-7.57	7.18	(25th Percentile of 61.2, -7.57, 7.18)	(75th Percentile of 61.2, -7.57, 7.18)	(Avg. of 61.2, -7.57, 7.18)
2	19.6	25.95	28.67	(25th Percentile of 19.69, 25.95, 28.67)	(75th Percentile of 19.69, 25.95, 28.67)	(AVG. of 19.69, 25.95, 28.67)
2.5	59.51	81.22	78.22	(25th Percentile of 59.51, 81.22, 78.22)	(75th Percentile of 59.51, 81.22, 78.22)	(AVG. of 59.51, 81.22, 78.22)
3	131.81	154.07	142.92	(25th Percentile of 131.81, 154.07, 142.92)	(75th Percentile of 131.81, 154.07, 142.92)	(AVG. of 131.81, 154.07, 142.92)

I have been looking for a long time now and the best I can do it :

df['mean'] = df[['WTG1','WTG2','WTG3'].mean(axis=1)
df['25th Percentile'] = np.nanpercentile(df[['WTG1','WTG2','WTG3']],25)
df['75th Percentile'] = np.nanpercentile(df[['WTG1','WTG2','WTG3']],75)

The mean seems to work, have not been checking the values yet though.

But the percentiles are the real issues here... it seems that nanpercentile function works only on columns. It returns the same value on every line (which I guess is the respective 25th and 75th percentile value but of the whole df) for both percentiles columns, which is not what I attend to do.

I was able to find some alternatives but could not adapt them to my need, as:

perc75 = np.vectorize(lambda x: np.percentile(x, 75))
df['75th_percentile'] = perc75(df['WTG01'].values)

which work but only for one column.

or

df['25th_percentile'] = df['WTG1','WTG2','WTG3'].apply(lambda x: np.percentile(x, 25))

which does not work...

If I understand you correctly, you are looking for the 25th and 75th percentile of WTG1? And separate 25th and 75th percentiles for WTG2 and WTG3? — rajah9, May 20 '22 at 13:08
Nope, I'm trying to get 25th and 75th percentile for WTG1 then WTG2 then WTG3 (all 3 separated) — Lolo, May 20 '22 at 13:34

blackraven · Accepted Answer · 2022-05-20T13:46:12.360

I think you could transpose the DataFrame and apply df.describe()

import pandas as pd
df = pd.DataFrame({'WTG1': [61.25, 19.69, 59.51, 131.81],
                   'WTG2': [-7.57, 25.95, 81.22, 154.07],
                   'WTG3': [7.18, 28.67, 78.22, 142.92]
                   })
print(df)
print(df.T)

Output

     WTG1    WTG2    WTG3
0   61.25   -7.57    7.18
1   19.69   25.95   28.67
2   59.51   81.22   78.22
3  131.81  154.07  142.92

          0      1      2       3
WTG1  61.25  19.69  59.51  131.81
WTG2  -7.57  25.95  81.22  154.07
WTG3   7.18  28.67  78.22  142.92

In the question, you're trying to get the statistics for each row. After you transpose the DataFrame, you could look at columns instead, so you'd get the summary statistics of each column conveniently

print(df.T.describe())

Output

               0          1          2           3
count   3.000000   3.000000   3.000000    3.000000
mean   20.286667  24.770000  72.983333  142.933333
std    36.233778   4.604824  11.764269   11.130006
min    -7.570000  19.690000  59.510000  131.810000
25%    -0.195000  22.820000  68.865000  137.365000
50%     7.180000  25.950000  78.220000  142.920000
75%    34.215000  27.310000  79.720000  148.495000
max    61.250000  28.670000  81.220000  154.070000

Thanks, I did try to transpose as well since, as you mentioned, it easier to work with column than with row. The "issue" is that the index are reset after a transpose but I guess it's minor if at the end I manage to get the value I want. I'll try to transpose, apply the describe function and transpose again. — Lolo, May 20 '22 at 13:30
the row index and column index swap places in a transpose.. sorry i'm confused, which one got reset? — blackraven, May 20 '22 at 13:37
I just tried again after a quick fix earlier in my script, this works fine, nothing got reset. Thanks — Lolo, May 20 '22 at 13:38

Python - To create 2 new column with 25th and 75th percentile of several row values

1 Answers1