2

I've been trying to serialize the sparse matrix from armadillo cpp library. I am doing some large-scale numerical computations, in which the data get stored in a sparse matrix, which I'd like to gather using mpi(Boost implementation) and sum over the matrices coming from different nodes. I'm stuck right now is how to send the sparse matrix from one node to other nodes. Boost suggests that to send user-defined objects (SpMat in this case) it needs to be serialized.

Boost's documentation gives a good tutorial on how to serialize a user-defined type and I can serialize some basic classes. Now, armadillo's SpMat class is very complicated for me to understand, and serialize.

I've come across few questions and their very elegant answers

  1. This answer by Ryan Curtin the co-author of Armadillo and author of mlpack has shown a very elegant way to serialize the Mat class.
  2. This answer by sehe shows a very simple way to serialize sparse matrix.

Using the first I can mpi::send a Mat class to another node in the communicator, but using the latter I couldn't do mpi::send.

This is adapted from the second linked answer

#include <iostream>
#include <boost/serialization/complex.hpp>
#include <boost/serialization/split_member.hpp>
#include <fstream>
#include <boost/archive/binary_oarchive.hpp>
#include <boost/archive/binary_iarchive.hpp>
#include <armadillo>
#include <boost/mpi.hpp>
namespace mpi = boost::mpi;
using namespace std;
using namespace arma;

namespace boost { 
    namespace serialization {

        template<class Archive>
            void save(Archive & ar, const arma::sp_mat &t, unsigned) {
                ar & t.n_rows;
                ar & t.n_cols;
                for (auto it = t.begin(); it != t.end(); ++it) {
                    ar & it.row() & it.col() & *it;
                }
            }

        template<class Archive>
            void load(Archive & ar, arma::sp_mat &t, unsigned) {
                uint64_t r, c;
                ar & r;
                ar & c;
                t.set_size(r, c);
                for (auto it = t.begin(); it != t.end(); ++it) {
                    double v;
                    ar & r & c & v;
                    t(r, c) = v;
                }
            }
    }}
BOOST_SERIALIZATION_SPLIT_FREE(arma::sp_mat)

int main(int argc, char *argv[])
{
    mpi::environment env(argc, argv);
    mpi::communicator world;
    arma::mat C(3,3, arma::fill::randu);
    C(1,1) = 0; //example so that a few of the components are u
    C(1,2) = 0;
    C(0,0) = 0;
    C(2,1) = 0;
    C(2,0) = 0;
    sp_mat A;
    if(world.rank() == 0) 
    {
        A = arma::sp_mat(C);
    }

    broadcast(world,A,0);

    if(world.rank() ==1 ) cout << A << endl;

    return 0;
}

I'm compiling like this

$ mpicxx -L ~/boost_1_73_0/stage/lib  -lboost_mpi -lboost_serialization -I ~/armadillo-9.900.1/include -DARMA_DONT_USE_WRAPPER -lblas -llapack serialize_arma_spmat.cpp -o serialize_arma_spmat

$ mpirun -np 2 serialize_arma_spmat
[matrix size: 3x3; n_nonzero: 0; density: 0%]

As process no. 2 didn't print the expected A matrix. So the broadcasting didn't work.

I couldn't try to build on Ryan's answer as I couldn't understand the sparse matrix implementation in "SpMat_Meat.hpp" in Armadillo which is very different from the Mat class.

How to serialize sparse matrix in boost so that it can be used in boost::mpi?

Galilean
  • 246
  • 4
  • 15

2 Answers2

4

I hate to say so, but that answer by that sehe guy was just flawed. Thanks for finding it.

The problem was that it didn't store the number of non-zero cells during serialization. Oops. I don't know how I overlooked this when testing.

(Looks like I had several versions and must have patched together a Frankenversion of it that wasn't actually properly tested).

I also threw in a test the matrix is cleared (so that if you deserialize into an instance that had the right shape but wasn't empty you don't end up with a mix of old and new data.)

FIXED

#include <armadillo>
#include <boost/archive/binary_iarchive.hpp>
#include <boost/archive/binary_oarchive.hpp>
#include <boost/serialization/split_member.hpp>
#include <fstream>
#include <iostream>

BOOST_SERIALIZATION_SPLIT_FREE(arma::sp_mat)

namespace boost { namespace serialization {

    template<class Archive>
    void save(Archive & ar, const arma::sp_mat &t, unsigned) {
        ar & t.n_rows & t.n_cols & t.n_nonzero;

        for (auto it = t.begin(); it != t.end(); ++it) {
            ar & it.row() & it.col() & *it;
        }
    }

    template<class Archive>
    void load(Archive & ar, arma::sp_mat &t, unsigned) {
        uint64_t r, c, nz;
        ar & r & c & nz;

        t.zeros(r, c);
        while (nz--) {
            double v;
            ar & r & c & v;
            t(r, c) = v;
        }
    }
}} // namespace boost::serialization

int main() {

    arma::mat C(3, 3, arma::fill::randu);
    C(0, 0) = 0;
    C(1, 1) = 0; // example so that a few of the components are u
    C(1, 2) = 0;
    C(2, 0) = 0;
    C(2, 1) = 0;

    {
        arma::sp_mat const A = arma::sp_mat(C);
        assert(A.n_nonzero == 4);

        A.print("A: ");
        std::ofstream outputStream("bin.dat", std::ios::binary);
        boost::archive::binary_oarchive oa(outputStream);
        oa& A;
    }

    {
        std::ifstream inputStream("bin.dat", std::ios::binary);
        boost::archive::binary_iarchive ia(inputStream);

        arma::sp_mat B(3,3);
        B(0,0) = 77; // some old data should be cleared

        ia& B;

        B.print("B: ");
    }
}

Now correctly prints

A:
[matrix size: 3x3; n_nonzero: 4; density: 44.44%]

     (1, 0)         0.2505
     (0, 1)         0.9467
     (0, 2)         0.2513
     (2, 2)         0.5206

B:
[matrix size: 3x3; n_nonzero: 4; density: 44.44%]

     (1, 0)         0.2505
     (0, 1)         0.9467
     (0, 2)         0.2513
     (2, 2)         0.5206
sehe
  • 374,641
  • 47
  • 450
  • 633
  • This still doesn't look right. The [.set_size()](http://arma.sourceforge.net/docs.html#set_size) member function is not guaranteed to clear an existing matrix. If the matrix size already has the same as the requested size, .set_size() does nothing and you could end up with unwanted data. Consider using [.zeros()](http://arma.sourceforge.net/docs.html#zeros_member) instead, which explicitly sets the elements to zero. – hbrerkere Apr 27 '21 at 07:06
  • @hbrerkere if you look at the behaviour of the example shown, you see that I tested **exactly** that. Did I miss something? But, yeah, `zeros()` might be good (didn't read the docs of that). Thanks for the doc link, that's pretty definitive. Fixing code – sehe Apr 27 '21 at 12:00
  • Nice. Any chance you can make it use Cereal instead of Boost, just like current mlpack? – Dirk Eddelbuettel Apr 27 '21 at 12:34
  • @DirkEddelbuettel I bet I can. But I don't have cereal handy. Maybe someone should ask ( & answer) that question :) – sehe Apr 27 '21 at 12:36
0

A version with cereal and Rcpp. If you use plain C++, then remove Rcpp parts. Thanks to @sehe for the introduction of this example with boost.

// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::depends(Rcereal)]]
#include <iostream>
#include <fstream>
#include <cereal/archives/binary.hpp>
#include <cereal/access.hpp>
#include <RcppArmadillo.h>


namespace arma {

  template<class Archive>
  void save(Archive & ar, const arma::sp_mat &t){
    ar(t.n_rows, t.n_cols, t.n_nonzero);
    
    for (auto it = t.begin(); it != t.end(); ++it) {
      ar(it.row(), it.col(), *it);
    }
  }

  template<class Archive>
  void load(Archive & ar, arma::sp_mat &t) {
    arma::uword r, c, nz;
    ar(r, c, nz);
    
    t.zeros(r, c);
    while (nz--) {
      double v;
      ar(--r, --c, v);
      t(r, c) = v;
    }
  }
}



// [[Rcpp::export]]
int main() {
  
  { // Serialize
    arma::mat C(3, 3, arma::fill::randu);
    C(0, 0) = 0;
    C(1, 1) = 0; // example so that a few of the components are u
    C(1, 2) = 0;
    C(2, 0) = 0;
    C(2, 1) = 0;
    arma::sp_mat const spA = arma::sp_mat(C);
    assert(spA.n_nonzero == 4);
    spA.print("spA: ");
    

    std::ofstream os("Backend/Serialize_Arma.bin", std::ios::binary);
    cereal::BinaryOutputArchive oarchive(os);
    oarchive(spA);
  }
  
  // .... put put put ... 
  
  { // Deserialize
    arma::sp_mat spB;
    
    std::ifstream is("Backend/Serialize_Arma.bin", std::ios::binary);
    cereal::BinaryInputArchive iarchive(is);
    iarchive(spB);
    
    spB.print();
  }
  
  return 0;
}
G4lActus
  • 64
  • 7