I am trying to calculate z-scores for a dataset using scipy.stats, and am running into a very weird subtle error that I cannot figure out. The code is running, but appears to be producing data that is slightly off, which I am concerned is adversely impacting a PCA that I am running on the normalized dataset.
I have the following data in a list:
mylist = [0.565, 0.629, 0.687, 0.797, 0.56, 0.722]
I run the following commands to Z-score normalize the data using scipy.stats:
import scipy.stats as scipy
zscore_list = scipy.zscore(mylist)
[-1.11793077, -0.36479846, 0.31772769, 1.61217384, -1.17676923, 0.72959692]
However, when I calculate the same data manually, I get a different result:
import statistics as stats
for x in mylist:
`print(str((x-stats.mean(mylist))/stats.stdev(mylist)))`
Result:
-1.0205264990693814
-0.33301391022264026
0.29004437341971895
1.471706635500054
-1.074238420073032
0.6660278204452793
I have tried various things to address the issue, including converting "mylist" into a numpy array, using axis=None and ddof=0 in the call to "scipy.zscore", and nothing changes the result.