How to compute and apply z score based on one column, and the apply it to that same column in Pyhton?

Question

I have a pandas data frame with 5 columns: X, Y Z, Value1, Value2.

I want to compute Z score based on column Value 1, and then apply it. I can't figure out how to do it properly. I have tried both

from scipy import stats
z_score = np.abs(stats.zscore(df["Value1"]))
df["Value1"] = df["Value1"][(z_score < 3).all(axis=1)]

and

from scipy.stats import zscore
df["Value1"].apply(zscore)

but non seems to work properly. Not sure what to do since I either key a

KeyError: False or IndexError: tuple index out of range.

I realize that I messed up the format. Not sure what I am doing wrong exactly — Zygos, Dec 09 '19 at 14:06
It's not clear exactly what you're trying to do..? are you trying to filter?: `df.loc[z_score < 3]` Or are you checking if all are within the threshold of 3..? : `np.all(z_score < 3)` — Chris Adams, Dec 09 '19 at 14:29
Not sure but this [answer](https://stackoverflow.com/a/57162033/10140310) could help. @Zygos — help-ukraine-now, Dec 09 '19 at 14:59
Does this answer your question? [Is there function that can remove the outliers?](https://stackoverflow.com/questions/57161413/is-there-function-that-can-remove-the-outliers) — help-ukraine-now, Dec 09 '19 at 15:03
if you want to slice those with a z_score less than 3 then this is the syntax: `df[df.z_score < 3]` — Yuca, Dec 09 '19 at 15:34

score 2 · Accepted Answer · answered Dec 09 '19 at 15:01

2

Just assign a column called 'z_score' and use it in filtering.

df['z_score'] = np.abs(stats.zscore(df["Value1"]))
df.query('z_score > 3', inplace=True)  # If filter all DF.
df['Value1'] = df['Value1'].mask(df['z_score'] > 3)  # If filter by masking.

answered Dec 09 '19 at 15:01

Oleg O

1,005
6
11

How to compute and apply z score based on one column, and the apply it to that same column in Pyhton?

1 Answers1