-1

I need to calculate the cumulative variance of a vector. I have tried to build and script, but this script takes too much time to calculate the cumulative variance of my vectors of size 1*100000. Do you know if there exists a faster way to find this cumulative variance?

This is the code I am using

%%Creation of the rand vectors. ans calculation of the variances

d=100000; %dimension of the vectors
nv=6 %quantity of vectors
for j=1:nv;
VItimeseries(:,j)=rand(d,1); % Final matrix with vectors
end

%% script to calculate the cumulative variance in the columns of my matrix
VectorVarianza=0;
VectoFinalVar=0;
VectorFinalTotalVAriances=zeros(d,nv);
    for k=1:nv %number of columns
    for j=1:numel(VItimeseries(:,k)) %size of the rows
        Vector=VItimeseries(:,k);       
        VectorVarianza(1:j)= Vector(1:j); % Vector to calculate the variance...
        ...Independently
        VectorFinalVar(j,k)= var(VectorVarianza);%Calculation of variances

    end
    VectorFinalTotalVAriances(:,k)=VectorFinalVar(:,k)% construction of the...
    ...Final Vector with the cumulative variances
end
marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
JuanMuñoz
  • 165
  • 2
  • 11
  • 1
    `VectorFinalVar(j,k)=` is expensive because the array is not preallocated. Why not write into `VectorFinalTotalVAriances` directly? `Vector=VItimeseries(:,k);` should be moved out of the loop for `j`, it is repeated unnecessarily. Those changes will speed up the code considerably. But in the end you are using a O(n^2) algorithm where O(n) is possible: don’t use `var` on each subarray, instead accumulate sum(x) and sum(x^2) as you move through the array, then use those to compute variance. And you can use `cumsum` to compute those without an explicit loop. – Cris Luengo Oct 13 '19 at 15:09
  • @CrisLuengo you are absolutely right. And I wonder, Matlab does not have a command for this comulative variance. I am trying to check with the command movvar(A, 100000), where 100000 is the sliding window where the variance is calculated. But it do not give me the expected result.:) – JuanMuñoz Oct 13 '19 at 19:17

1 Answers1

2

Looping over the n elements of x, and within the loop computing the variance of all elements up to i using var(x(1:i)) amounts to an algorithm O(n2). This is inherently expensive.

Sample variance (what var computes) is defined as sum((x-mean(x)).^2) / (n-1), with n = length(x). This can be rewritten as (sum(x.^2) - sum(x).^2 / n) / (n-1). This formula allows us to accumulate sum(x) and sum(x.^2) within a single loop, then compute the variance later. It also allows us to compute the cumulative variance in O(n).

For a vector x, we'd have the following loop:

x = randn(100,1); % some data

v = zeros(size(x)); % cumulative variance
s = x(1);           % running sum of x
s2 = x(1).^2;       % running sum of square of x
for ii = 2:numel(x) % loop starts at 2, for ii=1 we cannot compute variance
   s = s + x(ii);
   s2 = s2 + x(ii).^2;
   v(ii) = (s2 - s.^2 / ii) / (ii-1);
end

We can avoid the explicit loop by using cumsum:

s = cumsum(x);
s2 = cumsum(x.^2);
n = (1:numel(x)).';
v = (s2 - s.^2 ./ n) ./ (n-1); % v(1) will be NaN, rather than 0 as in the first version
v(1) = 0;                      % so we set it to 0 explicitly here

The code in the OP computes the cumulative variance for each column of a matrix. The code above can be trivially adapted to do the same:

s = cumsum(VItimeseries,1);     % cumulative sum explicitly along columns
s2 = cumsum(VItimeseries.^2,1);
n = (1:size(VItimeseries,1)).'; % use number of rows, rather than `numel`.
v = (s2 - s.^2 ./ n) ./ (n-1);
v(1,:) = 0;                     % fill first row with zeros, not just first element
Cris Luengo
  • 55,762
  • 10
  • 62
  • 120