How to efficiently initialize a SparseVector in Eigen

Question

In the Eigen docs for filling a sparse matrix it is recommended to use the triplet filling method as it can be much more efficient than making calls to coeffRef, which involves a binary search.

For filling SparseVectors however, there is no clear recommendation on how to do it efficiently.

The suggested method in this SO answer uses coeffRef which means that a binary search is performed for every insertion.

Is there a recommended, efficient way to build sparse vectors? Should I try to create a single row SparseMatrix and then store that as a SparseVector?

My use case is reading in LibSVM files, in which there can be millions of very sparse features and billions of data points. I'm currently representing these as an std::vector<Eigen::SparseVector>. Perhaps I should just use SparseMatrix instead?

Edit: One thing I've tried is this:

// for every data point in a batch do the following:

Eigen::SparseMatrix<float> features(1, num_features);
// copy the data over
typedef Eigen::Triplet<float> T;
std::vector<T> tripletList;
for (int j = 0; j < num_batch_instances; ++j) {
  for (size_t i = batch.offset[j]; i < batch.offset[j + 1]; ++i) {
    uint32_t index = batch.index[i];
    float fvalue = batch.value;
    if (index < num_features) {
      tripletList.emplace_back(T(0, index, fvalue));
    }
  }
  features.setFromTriplets(tripletList.begin(), tripletList.end());
  samples->emplace_back(Eigen::SparseVector<float>(features));
}

This creates a SparseMatrix using the triplet list approach, then creates a SparseVector from that object. In my experiments with ~1.4M features and very high sparsity this is 2 orders of magnitude slower than using SparseVector and coeffRef, which I definitely did not expect.

If they are already sorted, just properly reserve space and then call `vec.insertBack(i) = ...;` — ggael, Oct 11 '18 at 13:01
Hello @ggael, by sorted here you mean if my data file has the features in increasing index? LibSVM files _should_ have a guarantee like that. — Bar, Oct 11 '18 at 14:42
Hello @ggael, I tried this approach and I'm running into an unexpected error: `include/Eigen/src/SparseCore/SparseVector.h:184: Eigen::SparseVector<_Scalar, _Flags, _StorageIndex>::Scalar& Eigen::SparseVector<_Scalar, _Flags, _StorageIndex>::insert(Eigen::Index) [with _Scalar = float; int _Options = 0; _StorageIndex = int; Eigen::SparseVector<_Scalar, _Flags, _StorageIndex>::Scalar = float; Eigen::Index = long int]: Assertion 'i>=0 && i — Bar, Oct 11 '18 at 15:14

How to efficiently initialize a SparseVector in Eigen

0 Answers0