1

I have a pandas data frame with 5 columns: X, Y Z, Value1, Value2.

I want to compute Z score based on column Value 1, and then apply it. I can't figure out how to do it properly. I have tried both

from scipy import stats
z_score = np.abs(stats.zscore(df["Value1"]))
df["Value1"] = df["Value1"][(z_score < 3).all(axis=1)]

and

from scipy.stats import zscore
df["Value1"].apply(zscore)

but non seems to work properly. Not sure what to do since I either key a

KeyError: False or IndexError: tuple index out of range.

Yuca
  • 6,010
  • 3
  • 22
  • 42
Zygos
  • 61
  • 5

1 Answers1

2

Just assign a column called 'z_score' and use it in filtering.

df['z_score'] = np.abs(stats.zscore(df["Value1"]))
df.query('z_score > 3', inplace=True)  # If filter all DF.
df['Value1'] = df['Value1'].mask(df['z_score'] > 3)  # If filter by masking.
Oleg O
  • 1,005
  • 6
  • 11