0

I have a numpy matrix containing numbers.

1,0,1,1
0,1,1,1
0,0,1,0
1,1,1,1

I would like to perform a Z-Score Normalization over each column; z_Score[y] = (y-mean(column))/sqrt(var) y being each element in the column, mean being the mean function, sqrt the squared root function and var the variance.

My Approach was the following:

x_trainT = x_train.T #transpose the matrix to iterate over columns
for item in x_trainT:
    m = item.mean()
    var = np.sqrt(item.var())
    item = (item - m)/var
x_train = x_trainT.T

I thought that upon iteration, each row is accessed by reference, (like in c# lists for instance), therefore allowing me to change the matrix values through changing row values.
However I was wrong, since the matrix keeps its original values intact.

Your help is appreciated.

Elie Asmar
  • 2,995
  • 4
  • 17
  • 30
  • Possible duplicate of [computing z-scores for 2D matrices in scipy/numpy in Python](https://stackoverflow.com/questions/2985135/computing-z-scores-for-2d-matrices-in-scipy-numpy-in-python) – Ruzihm Oct 10 '19 at 06:40
  • `item=...` assigns a new object to `item`, breaking its link with iteration variable. So you aren't modifying the array. – hpaulj Oct 10 '19 at 07:00

2 Answers2

2

I'd recommend you to avoid iterations when possible. You can compute the mean and std in a 'column wise' manner.

>>> import numpy as np
>>> x_train = np.random.random((5, 8))
>>> norm_x_train = (x_train  - x_train.mean(axis=0)) / x_train.std(axis=0)
Guillem
  • 2,376
  • 2
  • 18
  • 35
1

You'll likely have to index over row number:

x_trainT = x_train.T
for i in range(x_trainT.shape[0]):
    item = x_trainT[i]
    m = item.mean()
    sd = np.sqrt(item.var())
    x_trainT[i] = (item - m)/sd
x_trainT = x_train.T
Daniel Nguyen
  • 419
  • 2
  • 7