Why is does boost parallel graph edge takes so much more memory as compared with vertex?

Question

if (world_.rank() == 0)
       std::cout << "test the memory requirements of edges only using own graph" << std::endl;
       world_.barrier(); 
//     std::cout << getpid() << std::endl;
            auto mems1 = MemoryMonitor::instance().get_all_proc_memory();
            if (world_.rank() == 0) {
                std::cout << "before create graph" << std::endl;
                std::cout << mems1 << std::endl;
            }
            size_type vertex_size = 5000;
            size_type per_batch_size = vertex_size / world_.size();

typedef boost::adjacency_list<boost::vecS, boost::distributedS<ProcessGroup, boost::vecS>, boost::bidirectionalS> Graph;

        ProcessGroup pg;
        Graph g(vertex_size, pg);


        Timer t;
        for (size_type i = world_.rank() * per_batch_size; i < (world_.rank() + 1) * per_batch_size; i++) {
            for (size_type j = 0; j < vertex_size; j++) {
                Graph::vertex_descriptor from = boost::vertex(i, g);
                Graph::vertex_descriptor to = boost::vertex(j, g);
                boost::add_edge(from, to, g);
            }
        }
        synchronize(g);
        t.stop();
        t.print();

        world_.barrier();
        auto mems2 = MemoryMonitor::instance().get_all_proc_memory();
        if (world_.rank() == 0) {
            std::cout << "after create graph" << std::endl;
//          std::cout << getpid() "has" << std::endl;
            std::cout << mems2 << std::endl;
        }
        auto total_edge_size = vertex_size * vertex_size;
        auto edge_size_per_proc = total_edge_size / world_.size();
        std::cout << "edge_size_per_proc:" << edge_size_per_proc << std::endl;
        if (world_.rank() == 0) {
            for (auto i = 0; i < world_.size(); i++) {
                std::cout << "mem increase total:" << mems2[i] - mems1[i];
                std::cout << "\tper edge:"
                          << (mems2[i] - mems1[i]) * 1024 * 1024 / edge_size_per_proc
                          << std::endl;
            }
        }

returned below results

edge_size_per_proc:6250000
mem increase total:1020.52      per edge:171.215
mem increase total:1016.33      per edge:170.512
mem increase total:1029.33      per edge:172.693
mem increase total:1018.72      per edge:170.913

It says memory usage per edge is 170 bytes. While I test memory for vertex, it is only 4 bytes, so I wonder why it takes 170 bytes to store an edge? I have looked deeper into the source code of boost, and found that it keeps two vectors per vertex for in_edge and out_edge edges, with the definition below:

std::vector< boost::adjacency_list<boost::vecS, boost::vecS, boost::undirectedS, MyVertexDescriptor, MyEdgeDescriptor>>

But when I use sizeof printed its size, it is only 16 bytes, and after counting in both in and out edges, it should be 32 bytes. So where is the rest 140+ bytes gone?

sehe · Answer 1 · 2023-06-18T21:23:09.280

The real answer is in the profiler and the source code.

I have looked deeper into the source code of boost, and found that it keeps two vectors per vertex for in_edge and out_edge edges

That only makes sense when using boost::bidirectionalS instead of boost::undirectedS.

Regardless, obviously the size of your (very very poorly named) property types (MyVertexDescriptor [sic] and MyEdgeDescriptor [sic]) is extremely important. If, e.g. the edge property is 170 bytes, then you'd expect even more memory consumed per edge.

Diving In

Adjacency lists store iterators to out edges in the container of your choice (vecS selector in your example) inside each stored vertex.

I changed your sample to actually only use "own graph" (so removing the distributedS adaptor).

Live On Coliru

#include <boost/core/demangle.hpp>
#include <boost/graph/adjacency_list.hpp>
#include <boost/graph/random.hpp>
#include <random>
#include <ranges>
using boost::core::demangle;

static auto const seed = std::random_device{}();
static std::mt19937_64 prng{seed};

template <typename T>static std::string detect(std::vector<T> const&) {
    return "Vector of " + std::to_string(sizeof(T));
}
template <typename T> static std::string detect(std::list<T> const&) {
    struct Hack :std::list<T>{
        constexpr size_t observe_node_size() const { return sizeof(typename Hack::_Node); }
    };
    return "List of " + std::to_string(Hack{}.observe_node_size());
}
template <typename T> static std::string detect(std::set<T> const&) {
    return "Set of " + std::to_string(sizeof(typename std::set<T>::node_type));
}

template <typename D, typename S> void foo() {
    prng.seed(seed);

    struct VProps { std::array<char, 1024> x; };
    struct EProps { std::array<char, 64>   y; };

    using Graph = boost::adjacency_list<S, boost::vecS, D, VProps, EProps>;

    Graph g;
    using VStored = std::decay_t<decltype(g.m_vertices.front())>;
    using EStored = std::decay_t<decltype(g.m_edges.front())>;

    constexpr auto vsize   = sizeof(VStored);
    constexpr auto esize   = sizeof(EStored);
    constexpr auto adjsize = sizeof(*std::declval<VStored>().m_out_edges.begin());

    static_assert(esize == sizeof(EProps) + 2 * sizeof(typename Graph::vertex_descriptor));

    std::cout << demangle(typeid(D).name()) << " V:" << vsize << " Adj:" << adjsize << " E:" << esize << "\n";
    std::cout << " - out edges:\t" << detect(VStored{}.m_out_edges) << "\n";
    if constexpr (std::is_same_v<D, boost::bidirectionalS>)
        std::cout << " - in edges:\t" << detect(VStored{}.m_in_edges) << "\n";

    generate_random_graph(g, 100'000, 200'000, prng);
}

template <typename S> void foos() {
    std::cout << "\n ------ [ " << demangle(typeid(S).name()) << " ] ----\n";
    foo<boost::directedS, S>();
    foo<boost::undirectedS, S>();
    foo<boost::bidirectionalS, S>();
}

int main() {
    foos<boost::vecS>();
    foos<boost::listS>();
    foos<boost::setS>();
}

Prints e.g.

 ------ [ boost::vecS ] ----
boost::directedS V:1048 Adj:16 E:80
 - out edges:   Vector of 16
boost::undirectedS V:1048 Adj:16 E:80
 - out edges:   Vector of 16
boost::bidirectionalS V:1072 Adj:16 E:80
 - out edges:   Vector of 16
 - in edges:    Vector of 16

 ------ [ boost::listS ] ----
boost::directedS V:1048 Adj:16 E:80
 - out edges:   List of 32
boost::undirectedS V:1048 Adj:16 E:80
 - out edges:   List of 32
boost::bidirectionalS V:1072 Adj:16 E:80
 - out edges:   List of 32
 - in edges:    List of 32

 ------ [ boost::setS ] ----
boost::directedS V:1072 Adj:16 E:80
 - out edges:   Set of 8
boost::undirectedS V:1072 Adj:16 E:80
 - out edges:   Set of 8
boost::bidirectionalS V:1120 Adj:16 E:80
 - out edges:   Set of 8
 - in edges:    Set of 8

As you can see from the static assert

static_assert(esize == sizeof(EProps) + 2 * sizeof(typename Graph::vertex_descriptor));

the storage for edges seems pretty reasonable: just enough to store your own data and the source/target descriptors.

Improving?

I see few ways to improve:

avoid large properties
optimize out edge lists for specific maximum (or average) degree by using small_vector or even static_vector
avoid bidirectional graphs (of course, your algorithms may suffer a runtime complexity trade-off)

The last way to improve is to /not use boost::adjacency_list but instead model your graph concept using your own custom code. You can optimize this in any way you see fit. Of course, it is the most amount of work.

BONUS Profiling

I forgot to include the Massif memory profiling graph of the sample program I included. Perhaps it is is of use to you:

Thank you, let me dig into it a little bit... Since I don't define any edge property, so I need to investigate why the edge takes such memory.... — Michael, Jun 18 '23 at 15:21
The property will default to type `no_property` which [sadly takes one `size_t`](http://coliru.stacked-crooked.com/a/2e4cc2e5aef0248b). It makes [the difference between various graph models come out much clearer](https://imgur.com/Z2MbpV8) for obvious reasons. I believe [EBO](https://en.cppreference.com/w/cpp/language/ebo) would have been beneficial there, but no doubt there's a reason they chose this. — sehe, Jun 18 '23 at 18:02

Why is does boost parallel graph edge takes so much more memory as compared with vertex?

1 Answers1

Diving In

Improving?

BONUS Profiling