Sampling from a boolean matrix in Eigen

Question

I have a matrix A of this form:

Eigen::Matrix<bool, n, m> A(n, m)

and I want to obtain a random element among the ones that are 'true'. The silly way to do that would be to obtain the number of 'true' elements t, generate a random number between 1 and t and iterate:

//r = random number
int k = 0;
for (int i = 0; i < A.rows(); ++i)
    for (int j = 0; j < A.cols(); ++j)
    {
        if (A(i, j))
            ++k;
        if (k == r)
            std::cout << "(" << i << ", " << j << ")" << std::endl;
    }

This solution is incredibly slow when multiple samples are needed and the matrix is big. Any suggestion as to how I should go about this?

In short: I'd like to find an efficient way to obtain the i-th 'true' element of the above matrix.

kangshiyin · Accepted Answer · 2016-06-10T20:36:17.380

1

You could use Eigen::SparseMatrix instead.

Eigen::SparseMatrix<bool> A(n, m);

With its compressed (or not) column/row storage scheme, you could find the r-th non-zero element in O(m)/O(n) time, or O(log(m)) with binary search.

You could use the COO format utility Eigen::Triplet to find the r-th non-zero element in O(1) time.

std::vector<Eigen::Triplet<bool> > a(num_nonzeros);

And yes, since it's a bool matrix, storing the values is unnecessary too.

edited Jun 10 '16 at 20:36

answered Jun 10 '16 at 17:56

kangshiyin

9,681
1
17
29

2

You can even simply build a `std::vector>` containing the set of valid `(i,j)` indices. No need for triplets here. – ggael Jun 10 '16 at 19:33
These are great solutions but I need the dense matrix form to do other operations that require me to look at the neighborhood of the samples I take. This means that I would need to create a second structure and update between the two, which would be easy in the SparseMatrix form (but I was looking more at O(1) solutions), but not so much for the others. Ideas? – user6451056 Jun 10 '16 at 20:16
@user6451056 it is C++, you could always create your own data structure for best performance. For example, using COO vector with addtional `OuterStarts` array similar to `Eigen::SparseMatrix`, you could still access the neighborhood in almost O(1) time (actually O(log(n)) if you use binary search in a column/row). The only question remain is that is this really your performance bottleneck? – kangshiyin Jun 10 '16 at 20:32
It is. I'll try out this option. Thank you! – user6451056 Jun 10 '16 at 20:54

Sampling from a boolean matrix in Eigen

1 Answers1