3

I have a matrix X, the size of which is 100*2000 double. I want to know which kind of scaling technique is applied to matrix X in the following command, and why it does not use z-score to do scaling?

X = X./repmat(sqrt(sum(X.^2)),size(X,1),1); 
John Saunders
  • 160,644
  • 26
  • 247
  • 397
Angelababy
  • 247
  • 3
  • 7
  • 16
  • 1
    Unlike forum sites, we don't use "Thanks", or "Any help appreciated", or signatures on [so]. See "[Should 'Hi', 'thanks,' taglines, and salutations be removed from posts?](http://meta.stackexchange.com/questions/2950/should-hi-thanks-taglines-and-salutations-be-removed-from-posts). – John Saunders May 25 '15 at 17:48
  • This is equivalent to z-score under the assumption that variables are centered and normally distributed. – damienfrancois May 25 '15 at 19:23
  • @damienfrancois - Interesting point. I've never looked at it that way. – rayryeng May 25 '15 at 19:35

1 Answers1

1

That scaling comes from linear algebra. That's what we call normalizing by producing a unit vector. Assuming that each row is an observation and each column is a feature, what's happening here is that we are going through every observation that you collected and normalizing each feature value over all observations such that the overall length / magnitude of a particular feature for all observations is set to 1.

The bottom division takes a look at each feature and determines the norm or magnitude of the feature over all observations. Once you find these magnitudes, you then take each feature for each observation and divide by their respective magnitudes.

The reason why unit vectors are often employed is to describe a point in feature space with respect to a set of basis vectors. Normalizing by producing unit vectors gives you the smallest possible way to represent one component in feature space and so what's probably happening here is that the observations are now being transformed such that each component / feature is being represented in terms of a set of basis vectors. Each basis vector is one feature in the data.

Check out the Wikipedia article on Unit Vectors for more details: http://en.wikipedia.org/wiki/Unit_vector

rayryeng
  • 102,964
  • 22
  • 184
  • 193