3

I have following matrix (MxN where M ≤ N):

0.8147    0.9134    0.2785    0.9649
0.9058    0.6324    0.5469    0.1576
0.1270    0.0975    0.9575    0.9706

From each row, I want to select following column entries respectively (one per row):

idx = [ 3  1  4 ];

This means we keep the elements in (1,3), (2,1) and (3,4), and the rest of the array should be zeros.

For the example above, I would get the following output:

     0         0    0.2785         0
0.9058         0         0         0
     0         0         0    0.9706

I currently generate this using a loop, which gets slow when the matrix size is bigger.

Can anyone suggest a more performant approach?

Dev-iL
  • 23,742
  • 7
  • 57
  • 99
Frey
  • 315
  • 4
  • 11

5 Answers5

4

You can use sub2ind function to convert your entries indexes to linear indexes.

When using linear indexes, matlab treats the matrix as a long column vector.

org_mat=[0.8147    0.9134    0.2785    0.9649
0.9058    0.6324    0.5469    0.1576
0.1270    0.0975    0.9575    0.9706];
entries=[3,1,4];

linear_entries=sub2ind(size(org_mat),1:length(entries),entries);
new_mat=zeros(size(org_mat));
new_mat(linear_entries)=org_mat(linear_entries);
Prostagma
  • 1,756
  • 9
  • 21
  • 1
    I really appreciate this solution, thanks @RadioJava, and I indicate this as a useful answer too. – Frey Aug 06 '18 at 01:30
4

There is some discussion in the other answers/comments about performance. This is one of the situations where a simple (well constructed) for loop will do the job nicely, with basically no impact on performance.

% For some original matrix 'm', and column indexing array 'idx':
x = zeros( size(m) ); % Initialise output of zeros
for ii = 1:numel(idx) % Loop over indices
    % Assign the value at the column index for this row
    x( ii, idx(ii) ) = m( ii, idx(ii) );     
end

This code is highly readable and quick. To justify "quick", I've written the below benchmarking code for all 4 current answers' methods, run on MATLAB R2017b. Here are the output plots.

  • For "small" matrices, up to 2^5 columns and 2^4 rows:

    small mats

  • For "large" matrices, up to 2^15 columns and 2^14 rows (same plot with and without the bsxfun solution because it ruins the scaling):

    large mats with bsxfun

    large mats

The first plot is perhaps slightly misleading. Although a consistent result (in that the performance ranking slow-fast is bsxfun then sub2ind then manual indices then looping), the y axis is 10^(-5) seconds, so it's basically immaterial which method you're using!

The second plot shows that, for large matrices, the methods are basically equivalent, except for bsxfun which is terrible (and not shown here, but it requires much more memory).

I'd opt for the clearer loop, it allows you more flexibility and you'll remember exactly what it's doing in your code 2 years from now.


Benchmarking code:

function benchie() 
    K = 5;                      % Max loop variable
    T = zeros( K, 4 );          % Timing results
    for k = 1:K
        M = 2^(k-1); N = 2^k;   % size of matrix
        m = rand( M, N );       % random matrix
        idx = randperm( N, M ); % column indices

        % Define anonymous functions with no inputs for timeit, and run
        f1 = @() f_sub2ind( m, idx ); T(k,1) = timeit(f1);
        f2 = @() f_linear( m, idx );  T(k,2) = timeit(f2);
        f3 = @() f_loop( m, idx );    T(k,3) = timeit(f3);   
        f4 = @() f_bsxfun( m, idx );  T(k,4) = timeit(f4);   
    end
    % Plot results
    plot( (1:K)', T, 'linewidth', 2 );
    legend( {'sub2ind', 'linear', 'loop', 'bsxfun'} );
    xlabel( 'k, where matrix had 2^{(k-1)} rows and 2^k columns' );
    ylabel( 'function time (s)' )
end

function f_sub2ind( m, idx )
    % Using the in-built sub2ind to generate linear indices, then indexing
    lin_idx = sub2ind( size(m), 1:numel(idx), idx );
    x = zeros( size(m) );
    x( lin_idx ) = m( lin_idx );
end
function f_linear( m, idx )
    % Manually calculating linear indices, then indexing
    lin_idx = (1:numel(idx)) + (idx-1)*size(m,1);
    x = zeros( size(m) );
    x( lin_idx ) = m( lin_idx );
end
function f_loop( m, idx )
    % Directly indexing in a simple loop
    x = zeros( size(m) );
    for ii = 1:numel(idx)
        x( ii, idx(ii) ) = m( ii, idx(ii) );
    end
end
function f_bsxfun( m, idx )
    % Using bsxfun to create a logical matrix of desired elements, then masking
    % Since R2016b, can use 'x = ( (1:size(m,2)) == idx(:) ) .* m;'
    x = bsxfun(@eq, 1:size(m,2), idx(:)).*m;
end
Wolfie
  • 27,562
  • 7
  • 28
  • 55
3

TL;DR - This is my suggestion:

nI = numel(idx);
sz = size(m); 
x = sparse( 1:nI, idx, m(sub2ind( size(m), 1:numel(idx), idx )), sz(1), sz(2), nI);

The rest of the post discusses why it works better.


Seeing how the desired output matrix consists mostly of zeros, this practically begs the use of sparse matrices! Not only should this increase performance (especially for larger matrices), it should be much more memory friendly.

I'm going to add two functions to Wolfie's benchmark:

function x = f_sp_loop( m, idx )
  nI = numel(idx);
  sz = size(m); 
  x = spalloc( sz(1), sz(2), nI ); % Initialize a sparse array.
  for indI = 1:nI
      x( indI, idx(indI) ) = m( indI, idx(indI) ); % This generates a warning (inefficient)
  end
end

function x = f_sp_sub2ind( m, idx )
  nI = numel(idx);
  sz = size(m); 
  x = sparse( 1:nI, idx, m(sub2ind( size(m), 1:numel(idx), idx )), sz(1), sz(2), nI);
end

The difference, essentially, is that we're not preallocating the output as an array of zeros, but rather as a sparse array. The result of the benchmark1 is as follows:

New benchmark

... this raises a question - how come is the sparse approach about an order of magnitude faster?

To answer this, we should look at the actual runtime distribution inside the benchmarking function, which we can get from the profiler. To get even more information, we can make the profiler output memory consumption info, using profile('-memory','on'). After running a somewhat shorter version2 of the benchmark, which only runs for the top value of k, we get:

Profiler output #1

Profiler output #2

So we can conclude several things:

  1. The vast majority of runtime is spent on allocating and freeing memory, which is why the algorithms appear to have almost identical performance. Therefore, if we cut down on memory allocations, we directly save time (big plus for sparse!).
  2. Even though the sub2ind and loop approaches appear the same, we can still establish a "winner" among the two (see purple box on the bottom image) - sub2ind! It's 32ms for sub2ind vs 41ms for the loop.
  3. The sparse loop approach is unsurprisingly slow, as we were warned by mlint:

    Explanation

    Code Analyzer detects an indexing pattern for a sparse array that is likely to be slow. An assignment that changes the nonzero pattern of a sparse array can cause this error because such assignments result in considerable overhead.

    Suggested Action

    If possible, build sparse arrays using sparse as follows, and do not use indexed assignments (such as C(4) = B) to build them:

    1. Create separate index and value arrays.
    2. Call sparse to assemble the index and value arrays.

    If you must use indexed assignments to build sparse arrays, you can optimize performance by first preallocating the sparse array with spalloc.

    If the code changes only array elements that are already nonzero, then the overhead is reasonable. Suppress this message as described in Adjust Code Analyzer Message Indicators and Messages.

    For more information, see “Constructing Sparse Matrices”.

  4. The approach which combines the best of both worlds, meaning the memory savings of sparse and the vectorization of sub2ind, appears to be the best one with a mere 3ms runtime!

1 Code for making the chart:

function q51605093()
    K = 15;                     % Max loop variable
    T = zeros( K, 4 );          % Timing results
    for k = 1:K
        M = 2^(k-1); N = 2^k;   % size of matrix
        m = rand( M, N );       % random matrix
        idx = randperm( N, M ); % column indices

        % Define anonymous functions with no inputs, for timeit, and run
        f = cell(4,1);
        f{1} = @() f_sub2ind( m, idx ); 
        f{2} = @() f_loop( m, idx );   
        f{3} = @() f_sp_loop( m, idx );
        f{4} = @() f_sp_sub2ind( m, idx );
        T(k,:) = cellfun(@timeit, f);

        if k == 5 % test equality during one of the runs
          R = cellfun(@feval, f, 'UniformOutput', false);
          assert(isequal(R{:}));
        end
    end
    % Plot results
    figure();
    semilogy( (1:K).', T, 'linewidth', 2 ); grid on; xticks(0:K);
    legend( {'sub2ind', 'loop', 'sp\_loop', 'sp\_sub2ind'}, 'Location', 'NorthWest' );
    xlabel( 'k, where matrix had 2^{(k-1)} rows and 2^k columns' );
    ylabel( 'function time (s)' )    
end

function x = f_sub2ind( m, idx )
    % Using the in-built sub2ind to generate linear indices, then indexing
    lin_idx = sub2ind( size(m), 1:numel(idx), idx );
    x = zeros( size(m) );
    x( lin_idx ) = m( lin_idx );
end

function x = f_loop( m, idx )
    % Directly indexing in a simple loop
    x = zeros( size(m) );
    for ii = 1:numel(idx)
        x( ii, idx(ii) ) = m( ii, idx(ii) );
    end
end

function x = f_sp_loop( m, idx )
  nI = numel(idx);
  sz = size(m); 
  x = spalloc( sz(1), sz(2), nI ); % Initialize a sparse array.
  for indI = 1:nI
      x( indI, idx(indI) ) = m( indI, idx(indI) ); % This generates a warning (inefficient)
  end
end

function x = f_sp_sub2ind( m, idx )
  nI = numel(idx);
  sz = size(m); 
  x = sparse( 1:nI, idx, m(sub2ind( size(m), 1:numel(idx), idx )), sz(1), sz(2), nI);
end

2 Code for profiling:

function q51605093_MB()
    K = 15;                 % Max loop variable
    M = 2^(K-1); N = 2^K;   % size of matrix
    m = rand( M, N );       % random matrix
    idx = randperm( N, M ); % column indices

    % Define anonymous functions with no inputs, for timeit, and run
    f = cell(4,1);
    f{1} = f_sub2ind( m, idx ); 
    f{2} = f_loop( m, idx );   
    f{3} = f_sp_loop( m, idx );
    f{4} = f_sp_sub2ind( m, idx );

%     assert(isequal(f{:}));
end

... the rest is the same as above
Dev-iL
  • 23,742
  • 7
  • 57
  • 99
  • Given that sparse matrices aren't as "friendly" as normal matrices, it might be worth chucking in a `full` call at the end for completeness. This would add back in the memory allocation but might be more efficient using it after the sparse allocations rather than dealing in full matrices from the start? – Wolfie Jul 31 '18 at 15:52
  • I really appreciate these suggestions, thanks @Dev-iL, and I indicate this as a useful answer. – Frey Aug 06 '18 at 01:28
2

No bsxfun no party

Let m be the input matrix and idx be a vector with the column indices. You can build a logical mask from idx and multiply element-wise by m, as follows:

result = bsxfun(@eq, 1:size(m,2), idx(:)).*m;
Luis Mendo
  • 110,752
  • 13
  • 76
  • 147
  • 2
    Which means (especially post-R2016b) this is **by far** the most elegant solution, `result = ((1:size(m,2)) == idx(:)) .* m` – Wolfie Jul 31 '18 at 10:01
  • @Wolfies _and_ the slowest :-P – Luis Mendo Jul 31 '18 at 10:02
  • 1
    Just added to my [benchmark](https://stackoverflow.com/a/51609502/3978545), it's pretty comparable for small matrices so I'd favour it for neatness, but for large matrices it's pretty poor! – Wolfie Jul 31 '18 at 10:10
  • 1
    I really appreciate this simple solution, thanks @Luis, and I indicate this as a useful answer too. – Frey Aug 06 '18 at 01:29
1

This should be faster than sub2ind:

m = [0.8147, 0.9134, 0.2785, 0.9649;
0.9058, 0.6324, 0.5469, 0.1576;
0.1270, 0.0975, 0.9575,   0.9706];

n=[3,1,4];

linear = (1:length(n)) + (n-1)*size(m,1);
new_m = zeros(size(m));
new_m(linear) = m(linear);
Prostagma
  • 1,756
  • 9
  • 21
Grossmend
  • 186
  • 8