I am attempting calculate z-scores at once for a series of columns, but inspecting the data reveals that the mean values for columns are NOT 0 as you should expect for the calculation of a z-score.
As you can see by running the code below, column a and column d does not have 0 means in the newly created *_zscore column.
import pandas as pd
df = pd.DataFrame({'a': [500,4000,20], 'b': [10,20,30], 'c': [30,40,50], 'd':[50,400,20] })
cols = list(df.columns)
for col in cols:
col_zscore = col + '_zscore'
df[col_zscore] = (df[col] - df[col].mean())/df[col].std(ddof=0)
print(df.describe())
My actual data is obviously different, but the results are similar (i.e.: non-zero means). I have also used
from scipy import stats
stats.zscore(df)
which leads to a similar result. Doing the same transformation in R (i.e.: scaled.df <- scale(df)) works though.
Does anyone have an idea what is going on here? The columns with error contain higher values, but it should also be possible to z-transform them.
EDIT: as Rob pointed out, the results are essentially 0.