0

I'm implementing an adaptive (approximate) matrix-vector multiplication for very large systems (known sparsity structure) - see Predicting runtime of parallel loop using a-priori estimate of effort per iterand (for given number of workers) for a more long-winded description. I first determine the entries I need to calculate for each block, but even though the entries are only a small subset, calculating them directly (with quadrature) would be impossibly expensive. However, they are characterised by an underlying structure (the difference of their respective modulations) which means I only need to calculate the quadrature once per "equivalence class", which I get by calling unique on a large 2xN matrix of differences (and then mapping back to the original entries).

Unfortunately, this 2xN-matrix becomes so large in practice, that it is becoming somewhat of a bottleneck in my code - which is still orders magnitude faster than calculating the quadrature redundantly, but annoying nevertheless, since it could run faster in principle.

The problem is that the cluster on which I compute requires the -singleCompThread option, so that Matlab doesn't spread where it shouldn't. This means that unique is forced to use only one core, even though I could arrange it within the code that it is called serially (as this procedure must be completed for all relevant blocks).

My search for a solution has lead me to the function maxNumCompThreads, but it is deprecated and will be removed in a future release (aside from throwing warnings every time it's called), so I didn't pursue it further.

It is also possible to pass a function to a batch job and specify a cluster and a poolsize it should run on (e.g. j=batch(cluster,@my_unique,3,{D,'cols'},'matlabpool',127); this is 2013a; in 2013b, the key for 'matlabpool' changed to 'Pool'), but the problem is that batch opens a new pool. In my current setup, I can have a permanently open pool on the cluster, and it would take a lot of unnecessary time to always open and shut pools for batch (aside from the fact that the maximal size of the pool I could open would decrease).

What I'd like is to call unique in such a way, that it takes advantage of the currently open matlabpool, without requesting new pools of or submitting jobs to the cluster.

Any ideas? Or is this impossible?

Best regards, Axel

Ps. It is completely unfathomable to me why the standard set functions in Matlab have a 'rows'- but not a 'cols'-option, especially since this would "cost" about 5 lines of code within each function. This is the reason for my_unique:

function varargout=my_unique(a,elem_type,varargin)
% Adapt unique to be able to deal with columns as well

% Inputs:
%   a:
%       Set of which the unique values are sought
%   elem_type (optional, default='scalar'):
%       Parameter determining which kind of unique elements are sought.
%       Possible arguments are 'scalar', 'rows' and 'cols'.
%   varargin (optional):
%       Any valid combination of optional arguments that can be passed to
%       unique (with the exception of 'rows' if elem_type is either 'rows'
%       or 'cols')
%
% Outputs:
%   varargout:
%       Same outputs as unique

if nargin < 2; elem_type='scalar'; end
if ~any(strcmp(elem_type,{'scalar','rows','cols'}))
    error('Unknown Flag')
end

varargout=cell(1,max(nargout,1));

switch (elem_type)
    case 'scalar'
        [varargout{:}]=unique(a,varargin{:});
    case 'rows'
        [varargout{:}]=unique(a,'rows',varargin{:});
    case 'cols'
        [varargout{:}]=unique(transpose(a),'rows',varargin{:});
        varargout=cellfun(@transpose,varargout,'UniformOutput',false);
end

end
Community
  • 1
  • 1
Axel
  • 636
  • 5
  • 10

2 Answers2

0

Without trying the example you cited above, you could try blockproc to do block processing. It however, belongs to Image Processing Toolbox.

Lokesh A. R.
  • 2,326
  • 1
  • 24
  • 28
0

Leaving aside the 'rows' problem for the time being, if I've understood correctly, what you're after is a way to use an open parallel pool to do a large call to 'unique'. One option may be to use distributed arrays. For example, you could do:

spmd
    A = randi([1 100], 1e6, 2); % already transposed to Nx2
    r = unique(A, 'rows'); % operates in parallel
end

This works because sortrows is implemented for codistributed arrays. You'll find that you only get speedup from (co)distributed arrays if you can arrange for the data always to live on the cluster, and also when the data is so large that processing it on one machine is infeasible.

Edric
  • 23,676
  • 2
  • 38
  • 40
  • Thanks for the tip, maybe that is a feasible solution - I'll check it out as soon as possible. I've known about `spmd` but didn't look into it because a), I had misremembered it to mean "single processor, multiple data" instead of "single **program**, multiple data", and b), because I felt the "multiple data" part is the opposite of what I'm looking for - i.e. setting many workers on a single datum (of large size). – Axel Dec 04 '13 at 11:48
  • I've tried your code (with `5e7` instead of `1e6`) with different numbers of workers (3,6,12,64,128) within open matlabpools, both a local pool as well as a cluster, and observe no scaling whatsoever, unfortunately. – Axel Dec 04 '13 at 13:56