I'm implementing an adaptive (approximate) matrix-vector multiplication for very large systems (known sparsity structure) - see Predicting runtime of parallel loop using a-priori estimate of effort per iterand (for given number of workers) for a more long-winded description. I first determine the entries I need to calculate for each block, but even though the entries are only a small subset, calculating them directly (with quadrature) would be impossibly expensive. However, they are characterised by an underlying structure (the difference of their respective modulations) which means I only need to calculate the quadrature once per "equivalence class", which I get by calling unique
on a large 2xN matrix of differences (and then mapping back to the original entries).
Unfortunately, this 2xN-matrix becomes so large in practice, that it is becoming somewhat of a bottleneck in my code - which is still orders magnitude faster than calculating the quadrature redundantly, but annoying nevertheless, since it could run faster in principle.
The problem is that the cluster on which I compute requires the -singleCompThread
option, so that Matlab doesn't spread where it shouldn't. This means that unique
is forced to use only one core, even though I could arrange it within the code that it is called serially (as this procedure must be completed for all relevant blocks).
My search for a solution has lead me to the function maxNumCompThreads
, but it is deprecated and will be removed in a future release (aside from throwing warnings every time it's called), so I didn't pursue it further.
It is also possible to pass a function to a batch job and specify a cluster and a poolsize it should run on (e.g. j=batch(cluster,@my_unique,3,{D,'cols'},'matlabpool',127)
; this is 2013a; in 2013b, the key for 'matlabpool'
changed to 'Pool'
), but the problem is that batch opens a new pool. In my current setup, I can have a permanently open pool on the cluster, and it would take a lot of unnecessary time to always open and shut pools for batch
(aside from the fact that the maximal size of the pool I could open would decrease).
What I'd like is to call unique
in such a way, that it takes advantage of the currently open matlabpool, without requesting new pools of or submitting jobs to the cluster.
Any ideas? Or is this impossible?
Best regards, Axel
Ps. It is completely unfathomable to me why the standard set functions in Matlab have a 'rows'
- but not a 'cols'
-option, especially since this would "cost" about 5 lines of code within each function. This is the reason for my_unique
:
function varargout=my_unique(a,elem_type,varargin)
% Adapt unique to be able to deal with columns as well
% Inputs:
% a:
% Set of which the unique values are sought
% elem_type (optional, default='scalar'):
% Parameter determining which kind of unique elements are sought.
% Possible arguments are 'scalar', 'rows' and 'cols'.
% varargin (optional):
% Any valid combination of optional arguments that can be passed to
% unique (with the exception of 'rows' if elem_type is either 'rows'
% or 'cols')
%
% Outputs:
% varargout:
% Same outputs as unique
if nargin < 2; elem_type='scalar'; end
if ~any(strcmp(elem_type,{'scalar','rows','cols'}))
error('Unknown Flag')
end
varargout=cell(1,max(nargout,1));
switch (elem_type)
case 'scalar'
[varargout{:}]=unique(a,varargin{:});
case 'rows'
[varargout{:}]=unique(a,'rows',varargin{:});
case 'cols'
[varargout{:}]=unique(transpose(a),'rows',varargin{:});
varargout=cellfun(@transpose,varargout,'UniformOutput',false);
end
end