Just a note: If by "standardize the data", you mean mean 0 and stdev 1, that won't guarantee that all values wil be between -1 and 1; only 68% of values in a normal distribution are between -σ and σ.
In your x
array, 85 is a very clear outlier, which is why the corresponding value is greater than 1. Also note that having such large outliers will badly skew your stdev calculation.
Note that making a copy of x
inside the function does not help you in any way. Because of you call x
as an argument of the function, any changes made to x
only apply within the function and the global x
does not change unless you use the function to overwrite the global x
(e.g. x = my_func(x)
). Your function's local scope also mean that the copy of x
made within the function disappears once the function has been used.
If you're trying to get all values between 0 and 1, you can do this instead:
def standardize_x(x):
x -= np.mean(x)
# The following 2 lines make the distance between min(x) and max(x) 1.
# In the example, this makes the new range between -0.2 and 0.8.
x /= (np.amax(x) - np.amin(x))
x -= np.mean(x)
# Move x up to start at 0, multiply by 2 so that range is from 0 to 2,
# then subtract 1 so new minimum is -1 and new maximum is 1.
x = 2*(x - np.amin(x)) - 1
return x
# This will not overwrite the original x array.
standardized_x = standardize_x(x)
Let me know if anything needs clarification.