My organization is usually a few years behind the most recent Matlab version. I am finding that splitapply
is extremely slow when there are many groups (two numerical grouping variables), in sharp contrast to my experience with SQL. I suspect that it loops through all the groups. Whatever the cause, for long term planning purposes, I was wondering if anyone can comment on whether this is a problem in post-2015b Matlab versions?
Here is some benchmarking code. The results aren't as extreme as for my real problem, but it still shows the disparity in execution time.
clear all
% This data is more like mine and takes splitapply 3+ minutes to run
%-------------------------------------------------------------------
% tDat = array2table( ...
% floor([ rand(2e6,1) 20e3*rand(2e6,1) 50*rand(2e6,1) ]) , ...
% 'VariableNames' , {'data2add','groupVar1','groupVar2'} );
% This data runs way faster than mine, but still illustrates the problem
%------------------------------------------------------------------------
tDat = array2table( floor(100*rand(2e6,3)) , ...
'VariableNames' , ...
{'data2add','groupVar1','groupVar2'} );
[ G , tRollup ] = findgroups( tDat(:,{'groupVar1','groupVar2'}) );
tic
tRollup.total_sa = splitapply( @sum, tDat.data2add, G );
disp('Done splitapply')
toc
fprintf('\n')
tic
tRollup.total_aa1 = accumarray( G, tDat.data2add );
disp('Done accumarray #1')
toc
fprintf('\n')
tic
tRollup.total_aa2 = accumarray( G, tDat.data2add, [], @sum );
disp('Done accumarray #2')
toc
fprintf('\n')
% Confirm that results are equivalent
if isequal( tRollup.total_sa , tRollup.total_aa1 )
disp('tRollup.total_sa == tRollup.total_aa1')
else
disp('tRollup.total_sa ~= tRollup.total_aa1')
end
if isequal( tRollup.total_aa1 , tRollup.total_aa2 )
disp('tRollup.total_aa1 == tRollup.total_aa2')
else
disp('tRollup.total_aa1 ~= tRollup.total_aa2')
end
The output is:
Done splitapply
Elapsed time is 2.550241 seconds.
Done accumarray #1
Elapsed time is 0.021673 seconds.
Done accumarray #2
Elapsed time is 0.020397 seconds.
tRollup.total_sa == tRollup.total_aa1
tRollup.total_aa1 == tRollup.total_aa2