0

Given a matrix X with an arbitrary number of columns and rows (Each row representing a feature of the dataset), I want to normalize each value to be ((value - column mean) / column standard deviation). I came up with the following code, which works. Can this be optimized for less computation or is this optimal?

mu = mean(X);
sigma = std(X);
x_ones = ones(size(X));
zero_mean_X = X - x_ones * diag(mu);
X_norm = zero_mean_X ./ (x_ones * diag(sigma));
Bhargav Rao
  • 50,140
  • 28
  • 121
  • 140
Laizer
  • 5,932
  • 7
  • 46
  • 73

1 Answers1

2

Here is an optimization using bsxfun from Matlab

M = mean(X);
S = std(X);
Y = bsxfun(@minus,X, M);
Y = bsxfun(@rdivide, Y, S);

It is faster by 4 times

enter image description here

X = rand(1000);

t = zeros(100,2);
for ii = 1:100
    tic;
    M = mean(X);
    S = std(X);
    Y = bsxfun(@minus,X, M);
    Y = bsxfun(@rdivide, Y, S);
    t(ii,1) = toc;
end


for ii = 1:100
    tic;
    mu = mean(X);
    sigma = std(X);
    x_ones = ones(size(X));
    zero_mean_X = X - x_ones * diag(mu);
    X_norm = zero_mean_X ./ (x_ones * diag(sigma));
    t(ii,2) = toc;
end

figure('Color', 'w');
plot(t);
legend({'bsxfun', 'matrix'});
xlabel('NUmber of simulations');
ylabel('Time');
marsei
  • 7,691
  • 3
  • 32
  • 41