26

Matlab/Octave algorithm example:

 input vector: [ 1 0 2 0 7 7 7 0 5 0 0 0 9 ]
output vector: [ 1 1 2 2 7 7 7 7 5 5 5 5 9 ]

The algorithm is very simple: it goes through the vector and replaces all zeros with the last non-zero value. It seems trivial, and is so when done with a slow for (i=1:length) loop and being able to refer to the previous element (i-1), but looks impossible to be formulated in the fast vectorized form. I tried the merge() and shift() but it only works for the first occurrence of zero, not an arbitrary number of them.

Can it be done in a vectorized form in Octave/Matlab or must C be used for this to have sufficient performance on big amount of data?


I have another similar slow for-loop algorithm to speed up and it seems generally impossible to refer to previous values in a vectorized form, like an SQL lag() or group by or loop (i-1) would easily do. But Octave/Matlab loops are terribly slow.

Has anyone found a solution to this general problem or is this futile for fundamental Octave/Matlab design reasons?


Performance benchmark:

SOLUTION 1 (slow loop)

in = repmat([ 1 0 2 0 7 7 7 0 5 0 0 0 9 ] ,1 ,100000);
out = in;
tic
for i=2:length(out) 
   if (out(i)==0) 
      out(i)=out(i-1);
   end
end
toc
[in(1:20); out(1:20)] % test to show side by side if ok

Elapsed time is 15.047 seconds.

SOLUTION 2 by Dan (~80 times faster)

in = V = repmat([ 1 0 2 0 7 7 7 0 5 0 0 0 9 ] ,1 ,100000);
tic;
d = double(diff([0,V])>0);
d(find(d(2:end))+1) = find(diff([0,~V])==-1) - find(diff([0,~V])==1);
out = V(cumsum(~~V+d)-1);
toc;
[in(1:20); out(1:20)] % shows it works ok

Elapsed time is 0.188167 seconds.

15.047 / 0.188167 = 79.97 times improvement

SOLUTION 3 by GameOfThrows (~115 times faster)

in = repmat([ 1 0 2 0 7 7 7 0 5 0 0 0 9 ] ,1 ,100000);
a = in;
tic;
pada = [a,888];
b = pada(pada >0);
bb = b(:,1:end-1);
c = find (pada==0);
d = find(pada>0);
len = d(2:end) - (d(1:end-1));
t = accumarray(cumsum([1,len])',1);
out = bb(cumsum(t(1:end-1)));
toc;

Elapsed time is 0.130558 seconds.

15.047 / 0.130558 = 115.25 times improvement

Magical SOLUTION 4 by Luis Mendo (~250 times faster)

in = repmat([ 1 0 2 0 7 7 7 0 5 0 0 0 9 ] , 1, 100000);
tic;
u = nonzeros(in);
out = u(cumsum(in~=0)).';
toc;

Elapsed time is 0.0597501 seconds.

15.047 / 0.0597501 = 251.83 times improvement


(Update 2019/03/13) Timings with MATLAB R2017a:

Slow loop:    0.010862 seconds.
Dan:          0.072561 seconds.
GameOfThrows: 0.066282 seconds.
Luis Mendo:   0.032257 seconds.
fillmissing:  0.053366 seconds.

So we draw yet again the same conclusion: loops in MATLAB are no longer slow!


See also: Trivial/impossible algorithm challenge in Octave/Matlab Part II: iterations memory

Cris Luengo
  • 55,762
  • 10
  • 62
  • 120
Paweł Załuski
  • 498
  • 5
  • 12
  • What version of MATLAB are you using? The loops really aren't slow anymore (for some years now) since they implemented the JIT compiler. – Dan Dec 02 '15 at 11:44
  • Also are the numbers between the zeros always the same or would this also be a possible input `[ 1 0 2 0 7 8 7 0 5 0 0 0 9 ]`? – Dan Dec 02 '15 at 11:47
  • @Dan I think this can be done, no? – GameOfThrows Dec 02 '15 at 12:18
  • Thx Dan. I'm using Octave 3.8.2 - the newest in Debian stable distro - and loops are thousands times slower than vectorized pieces ("warning: jit_enable: JIT compiling not available"). The numbers between zeros are arbitrary doubles, can be regarded random. The blocks of zeros are pre-set and can be changed to anything like NaN's or Inf's, because they are the places in the vector which were left unpopulated by the previous algorithm and hence need to get filled like described. – Paweł Załuski Dec 02 '15 at 12:20
  • 1
    @GameOfThrows yes I think so. Just added my attempt – Dan Dec 02 '15 at 12:27
  • 1
    @Pawel could you do a `timeit` comparison between the two answers and a naive `for`-loop solution in your Octave for us? It will be interesting to see if these options give any performance improvements – Dan Dec 02 '15 at 12:35
  • 1
    @Dan sure working on it – Paweł Załuski Dec 02 '15 at 13:02
  • 1
    @Pawel probably easier to use `isequal` than `[in(1:20); out(1:20)] # test to show side by side if ok`. – Dan Dec 02 '15 at 13:09
  • @Dan thx for the tip, one can learn a lot from you guys – Paweł Załuski Dec 02 '15 at 13:38
  • 2
    @Pawel Can you also try Luis Mendo's solution, which I suspect should be the fastest, and also the cleanest. – GameOfThrows Dec 02 '15 at 13:46
  • @Pawel, I posted another solution, which I'd assume to be fast as well. Could you include it in your benchmark? – Robert Seifert Dec 03 '15 at 07:20
  • @thewaywewalk Thanks for your solution! Of course - I will run it and include it in the benchmark - as soon as I get my hands on the same system, today if possible – Paweł Załuski Dec 03 '15 at 11:51
  • @Pawel don't worry I benched it myself (see my answer) – Robert Seifert Dec 03 '15 at 12:06

6 Answers6

22

The following simple approach does what you want, and is probably very fast:

in = [1 0 2 0 7 7 7 0 5 0 0 0 9];
t = cumsum(in~=0);
u = nonzeros(in);
out = u(t).';
Luis Mendo
  • 110,752
  • 13
  • 76
  • 147
  • 1
    Luis, there's a new function `fillmissing` since R2016b, your solution is faster than that. :) -- Of course, the plain old loop is now about 3 times faster than your solution... :) – Cris Luengo Mar 13 '19 at 15:14
7

I think it is possible, let's start with the basics, you want to capture where number is greater than 0:

 a = [ 1 0 2 0 7 7 7 0 5 0 0 0 9 ] %//Load in Vector
 pada = [a,888];  %//Pad A with a random number at the end to help in case the vector ends with a 0
 b = pada(find(pada >0)); %//Find where number if bigger than 0
 bb = b(:,1:end-1);     %//numbers that are bigger than 0
 c = find (pada==0);   %//Index where numbers are 0
 d = find(pada>0);     %//Index where numbers are greater than 0
 length = d(2:end) - (d(1:end-1));  %//calculate number of repeats needed for each 0 trailing gap.
 %//R = [cell2mat(arrayfun(@(x,nx) repmat(x,1,nx), bb, length,'uniformoutput',0))]; %//Repeat the value

 ----------EDIT--------- 
 %// Accumarray and cumsum method, although not as nice as Dan's 1 liner
 t = accumarray(cumsum([1,length])',1);
 R = bb(cumsum(t(1:end-1)));

NOTE: I used arrayfun, but you can use accumarray as well.I think this demonstrates that it is possible to do this in parallel?

R =

Columns 1 through 10

 1     1     2     2     7     7     7     7     5     5

Columns 11 through 13

 5     5     9

TESTs:

a = [ 1 0 2 0 7 7 7 0 5 0 0 0 9 0 0 0 ]

R =

Columns 1 through 10

 1     1     2     2     7     7     7     7     5     5

Columns 11 through 16

 5     5     9     9     9     9

PERFORMANCE:

a = repmat([ 1 0 2 0 7 7 7 0 5 0 0 0 9 ] ,1,10000); %//Double of 130,000
Arrayfun Method : Elapsed time is 6.840973 seconds.
AccumArray Method : Elapsed time is 2.097432 seconds.
GameOfThrows
  • 4,510
  • 2
  • 27
  • 44
  • How does this compare with a simple `for`-loop though in terms of performance time though? `arrayfun` doesn't really count as vectorization. It would be interesting to compare as I don't know if it has an edge over loops because the OP is using Octave and doesn't get JIT benefits. The comparison would have to be done in Octave though. – Dan Dec 02 '15 at 12:31
  • 1
    I think it is possible to do it with `accumarray` and `cumsum` combined. Let me update this – GameOfThrows Dec 02 '15 at 12:34
7

I think is a vectorized solution. Works on your example:

V = [1 0 2 0 7 7 7 0 5 0 0 0 9]
%// This is where the numbers you will repeat lie. You have to cast to a double otherwise later when you try assign numbers to it it caps them at logical 1s
d = double(diff([0,V])>0)
%// find(diff([0,~V])==-1) - find(diff([0,~V])==1) is the length of each zero cluster
d(find(d(2:end))+1) = find(diff([0,~V])==-1) - find(diff([0,~V])==1)
%// ~~V is the same as V ~= 0
V(cumsum(~~V+d)-1)
Dan
  • 45,079
  • 17
  • 88
  • 157
  • 2
    Also, I padded a non-zero number at the end of V, if you don't it does not work for when V ends in 0; I think. – GameOfThrows Dec 02 '15 at 12:34
  • 2
    @GameOfThrows also true. This will error in that case `find(diff([0,~V])==-1) - find(diff([0,~V])==1)` – Dan Dec 02 '15 at 12:36
6

Here is another solution, using linear interpolation with previous neighbor lookup.

I assume it to be quite fast as well, as there are just look-ups and indexing and no calculations:

in = [1 0 2 0 7 7 7 0 5 0 0 0 9]
mask = logical(in);
idx = 1:numel(in);
in(~mask) = interp1(idx(mask),in(mask),idx(~mask),'previous');
%// out = in

Explanation

You need to create an index vector:

idx = 1:numel(in)  $// = 1 2 3 4 5 ...

And a logical mask, masking all your non-zero values:

mask = logical(in);

This way you get the grid points idx(mask) and grid data in(mask) for the interpolation. The query points idx(~mask) are indices of the zero data. The query data in(~mask) is then "calculated" by next previous neighbor interpolation, so it basically looks in the grid what is the value for the previous grid point. Exactly what you want. Unfortunately the involved functions have a huge overhead for all thinkable cases, thats why it is still slower than Luis Mendo's Answer, though there are no arithmetic calculations involved.


Furthermore one could reduce the overhead of interp1 a little:

F = griddedInterpolant(idx(mask),in(mask),'previous');
in(~mask) = F(idx(~mask));

But there is not too much effect.


in =   %// = out

     1     1     2     2     7     7     7     7     5     5     5     5     9

Benchmark

0.699347403200000 %// thewaywewalk
1.329058123200000 %// GameOfThrows
0.408333643200000 %// LuisMendo
1.585014923200000 %// Dan

Code

function [t] = bench()
    in = repmat([ 1 0 2 0 7 7 7 0 5 0 0 0 9 ] ,1 ,100000);

    % functions to compare
    fcns = {
        @() thewaywewalk(in);
        @() GameOfThrows(in);
        @() LuisMendo(in);
        @() Dan(in);
    }; 

    % timeit
    t = zeros(4,1);
    for ii = 1:10;
        t = t + cellfun(@timeit, fcns);
    end
    format long
end

function in = thewaywewalk(in) 
    mask = logical(in);
    idx = 1:numel(in);
    in(~mask) = interp1(idx(mask),in(mask),idx(~mask),'previous');
end
function out = GameOfThrows(a) 
    pada = [a,888];
    b = pada(find(pada >0));
    bb = b(:,1:end-1);
    c = find (pada==0);
    d = find(pada>0);
    length = d(2:end) - (d(1:end-1));
    t = accumarray(cumsum([1,length])',1);
    out = bb(cumsum(t(1:end-1)));
end
function out = LuisMendo(in) 
    t = cumsum(in~=0);
    u = nonzeros(in);
    out = u(t).';
end
function out = Dan(V) 
    d = double(diff([0,V])>0);
    d(find(d(2:end))+1) = find(diff([0,~V])==-1) - find(diff([0,~V])==1);
    out = V(cumsum(~~V+d)-1);
end
Robert Seifert
  • 25,078
  • 11
  • 68
  • 113
4

New in MATLAB R2016b: fillmissing, it does exactly as described in the question:

in = [ 1 0 2 0 7 7 7 0 5 0 0 0 9 ];
in(in==0) = NaN;
out = fillmissing(in,'previous');

[This new functionality discovered in this duplicate question].

Cris Luengo
  • 55,762
  • 10
  • 62
  • 120
2

Vector operations generally assume independence of the individual items. If you have a dependence on an earlier item, then looping is the best way to do it.

Some extra background on matlab: In matlab the operations are typically faster not because of vector operations specifically, but because a vector operation simply does the loop in native C++ code instead of through the interpreter

Scott M.
  • 7,313
  • 30
  • 39
  • but the input vector is given at the beginning and each update does not depend on the previous update. So technically you should be able to do this without a loop. – GameOfThrows Dec 02 '15 at 12:20
  • 3
    This is more of a comment than an answer. Also, this problem is vertorizable in the MATLAB sense. – Dan Dec 02 '15 at 12:54