I have written a code to find zscore but it's output is greater than 1 , also it's standard deviation is not 1, what is the error here

Question

i have written a code to standardize data from scratch but the problem is, it's output is sometimes greater than 1, i have no clue what's the issue.

if i've did something stupid, please point out my errors,

here's the code

import numpy as np
import matplotlib.pyplot as plt
def std(x):
    x = x.copy()
    mean = np.mean(x,axis=1, keepdims=True)
    x = x-mean
    x/=np.std(x)
    
    return x
x = np.array([[1,2,3,3.6,7,85,23]])
print(std(x))

Output :

[[-0.59333009 -0.55801282 -0.52269556 -0.5015052  -0.38142649  2.37332037
   0.18364979]]

it sounds like you want to `normalize` the data... – D.L Mar 13 '22 at 10:22 — D.L, Mar 13 '22 at 10:22

score 0 · Answer 1 · answered Mar 12 '22 at 16:58

Just a note: If by "standardize the data", you mean mean 0 and stdev 1, that won't guarantee that all values wil be between -1 and 1; only 68% of values in a normal distribution are between -σ and σ.

In your x array, 85 is a very clear outlier, which is why the corresponding value is greater than 1. Also note that having such large outliers will badly skew your stdev calculation.

Note that making a copy of x inside the function does not help you in any way. Because of you call x as an argument of the function, any changes made to x only apply within the function and the global x does not change unless you use the function to overwrite the global x (e.g. x = my_func(x)). Your function's local scope also mean that the copy of x made within the function disappears once the function has been used.

If you're trying to get all values between 0 and 1, you can do this instead:

def standardize_x(x):
    x -= np.mean(x)

    # The following 2 lines make the distance between min(x) and max(x) 1.
    # In the example, this makes the new range between -0.2 and 0.8.

    x /= (np.amax(x) - np.amin(x))
    x -= np.mean(x)

    # Move x up to start at 0, multiply by 2 so that range is from 0 to 2,
    # then subtract 1 so new minimum is -1 and new maximum is 1.

    x = 2*(x - np.amin(x)) - 1
    
    return x

# This will not overwrite the original x array.
standardized_x = standardize_x(x)

Let me know if anything needs clarification.

I have written a code to find zscore but it's output is greater than 1 , also it's standard deviation is not 1, what is the error here

1 Answers1