3

I have the following piece of code that is quite slow to compute the percentiles from a data set ("DATA"), because the input matrices are large ("Data" is approx. 500.000 long with 10080 unique values assigned from "Indices").

Is there a possibility/suggestions to make this piece of code more efficient? For example, could I somehow omit the for-loop?

k = 1;
for i = 0:0.5:100; % in 0.5 fractile-steps
     FRACTILE(:,k) = accumarray(Indices,Data,[], @(x) prctile(x,i));
     k = k+1;
end
Jonas
  • 308
  • 1
  • 11
  • Just as a note: `accumarray` is a wrapper for a loop anyway. – Adriaan Jan 28 '16 at 12:59
  • Fractile seems to be a matrix, what are you actually trying to do? Usually if you want to get rid of a for loop you can try make it a matrix operation, something that matlab is quite good (however bear in mind memory limitations) – Michele Ricciardi Jan 28 '16 at 13:09
  • Yes indeed, it is a matrix. I'm storing the individual percentile-results (0:0.5:100) for each of the unique 10080 indices. – Jonas Jan 28 '16 at 13:22

1 Answers1

7

Calling prctile again and again with the same data is causing your performance issues. Call it once for each data set:

FRACTILE=cell2mat(accumarray(Indices,Data,[], @(x) {prctile(x,[0:0.5:100])}));

Letting prctile evaluate your 201 percentiles in one call costs roughly as much computation time as two iterations of your original code. First because prctile is faster this way and secondly because accumarray is called only once now.

Daniel
  • 36,610
  • 3
  • 36
  • 69
  • Thank you very much, that was exactly what I was looking for. Easy to understand and much faster! – Jonas Jan 28 '16 at 13:21