0

I repropose a question I asked this week and that, due to a missing tag, went unnoticed (basically it was viewed only by me).

I have two large vectors, values and indices. I need to sum the elements of values using indices as in this brute force example:

% The two vectors, which are given, look like this:
N = 3e7;
values = (rand(N, 1) > 0.3);
indices = cumsum(ceil(4*rand(N, 1)));
indices = [0; indices(find(indices > 1, 1, 'first'):find(indices < N, 1, 'last')); N];
HH = numel(indices) - 1;

% This is the brute force solution
tic
out1 = zeros(HH, 1);
for hh = 1:HH
  out1(hh) = sum(values((indices(hh)+1):indices(hh+1)));
end
toc

A more efficient way to do it is the following:

tic
indices2 = diff(indices);
new_inds = (1:HH+1)';
tmp = zeros(N, 1);
tmp(cumsum(indices2)-indices2+1)=1;
new_inds_long = new_inds(cumsum(tmp));
out2 = accumarray(new_inds_long, values);
toc

A better solution is:

tic
out3 = cumsum(values);
out3 = out3(indices(2:end));
out3 = [out3(1); diff(out3)];
toc

The three solutions are equivalent

all(out1 == out2)
all(out1 == out3)

Question is: since this is really a basic function, is there any faster, already known approach/function that does the same and that I may be overlooking or that I am just not aware of?

1 Answers1

0

If generating your indices is not simply a dummy for some other, this could be improved. Currently you are wasting 3/4 of the generated numbers. 1) Determine the number of indices you want (binomial distribution) 2) generate only the used indices.

Daniel
  • 36,610
  • 3
  • 36
  • 69
  • I don't understand: values and indices are my input variables and they are given … Of course I can manipulate them (as in the second solution proposed, the one using accumarray). –  Oct 26 '13 at 12:30