2

I am trying to calculate the zscore for a vector of 5000 rows which has many nan values. I have to calculate this many times so I dont want to use a loop, I was hoping to find a vectorized solution.

the loop solution:

for i = 1:end
   vec(i,1) = (val(i,1) - nanmean(:,1))/nanstd(:,1)
end

a partial vectorized solution:

zscore(vec(find(isnan(vec(1:end) == 0))))

but this returns a vector the length of the original vector minus the nan values. Thus it isn't the same as the original size.

I want to calculated the zscore for the vector and then interpolate missing data after words. I have to do this 100s of times thus I am looking for a fast vectorized approach.

nrz
  • 10,435
  • 4
  • 39
  • 71
user1129988
  • 1,516
  • 4
  • 19
  • 32

4 Answers4

1

This is a vectorized solution:

% generate some example data with NaNs.

val = reshape(magic(4), 16, 1);
val(10) = NaN;
val(17) = NaN;

Here's the code:

valWithoutNaNs = val(~isnan(val));
valMean = mean(valWithoutNaNs);
valSD = std(valWithoutNaNs);
valZscore = (val-valMean)/valSD;

Then column vector valZscore contains deviations (Z scores), and has NaN values for NaN values in val, the original measurement data.

nrz
  • 10,435
  • 4
  • 39
  • 71
1

Sorry this answer is 6 months late, but for anyone else who comes across this thread:

The accepted answer isn't fully vectorised in that it doesn't do what the real zscore does so beautifully: That is, do zscores along a particular dimension of a matrix.

If you want to calculate zscores of a large number of vectors at once, as the OP says he is doing, the best solution is this:

Z = bsxfun(@divide, bsxfun(@minus, X, nanmean(X)) , 
                   nanstd(X) );

To do it on an arbitrary dimension, just put the dimension inside the nanmean and nanstd, and bsxfun takes care of the rest.

nanzscore = @(X,DIM) bsxfun(@divide, bsxfun(@minus, X, nanmean(X,DIM)), ...
                                     nanstd(X,DIM));
Sanjay Manohar
  • 6,920
  • 3
  • 35
  • 58
0

anonymous function:

nanZ = @(xIn)(xIn-nanmean(xIn))/nanstd(xIn);

nanZ(vectorWithNans)

0

vectorized version of below anonymous function (assumes observations are in rows, variables in columns):

nanZ = @(xIn)(xIn-repmat(nanmean(xIn),size(xIn,1),1))./repmat(nanstd(xIn),size(xIn,1),1);
nanZ(matrixWithNans)
Tushar Gupta - curioustushar
  • 58,085
  • 24
  • 103
  • 107
Dan
  • 1