boost graph library: deterministic order of iteration of in_edges?

Question

TL;DR: I would very much like for the order of iteration of in_edges on my graph (adjacency_list with edge_list of setS) to be determinstic, but as far as I can tell, the order of iteration is determined by a comparison operator that is just pointer comparison---thus iteration order is determined by the vagaries of malloc. Help!

For concreteness, my graph and related types are:

struct VertexCargo { int Id; ... };

typedef adjacency_list<setS, vecS, bidirectionalS, property<vertex_info_t, VertexCargo> > Graph;

typedef graph_traits<Graph>::edge_descriptor   ED;
typedef graph_traits<Graph>::vertex_descriptor VD;

My logic, in case anyone spots a fallacy anywhere:

in_edges iteration is directly determined by iteration of the edge list container
edge list of setS implies underlying container std::set<edge_desc_impl> Note: this assumption was incorrect; it is actually a std::set<StoredEdge, which offers a comparator that compares on edge target.
std::set<edge_desc_impl> iteration order determined by operator<(edge_desc_impl, edge_desc_impl)
operator<(edge_desc_impl, edge_desc_impl) ends up being just a pointer comparison;

The operator< is defined in boost/graph/detail/edge.hpp (boost 1.58):

30      template <typename Directed, typename Vertex>
31      class edge_desc_impl  : public edge_base<Directed,Vertex> {

...

35        typedef void                              property_type;
36
37        inline edge_desc_impl() : m_eproperty(0) {}
38
39        inline edge_desc_impl(Vertex s, Vertex d, const property_type* eplug)
40          : Base(s,d), m_eproperty(const_cast<property_type*>(eplug)) { }
41
42        property_type* get_property() { return m_eproperty; }
43        const property_type* get_property() const { return m_eproperty; }
44
45        //  protected:
46        property_type* m_eproperty;
47      };
48

...

63
64      // Order edges according to the address of their property object
65      template <class D, class V>
66      inline bool
67      operator<(const detail::edge_desc_impl<D,V>& a,
68                 const detail::edge_desc_impl<D,V>& b)
69      {
70        return a.get_property() < b.get_property();
71      }

I would really like it if there were a way to induce the comparison (and thus iteration order) to be based on something deterministic (i.e. not pointer addresses determined by mallloc; for example a custom operator I write that looks at VertexCargo::Id for source and target of the edges) but it looks like that might not be workable here because of the void* cast.

From the code above it also looks possible (?) to inject a property giving the desired ordering. But that strikes me as a hack.

Anybody have wisdom to share?

Note on the answer

@sehe's response is the correct answer---the underlying container is in fact a std::set<StoredEdge>, which defines an operator< that compares on the edge target; this is deterministic when the vertex container selector is vecS but not in general (since the vertex identifiers in other cases are basically pointers gotten from malloc).

What is the goal. Just in-order iteration, or in-order underlying storage? (IOW are you hoping for this to be an optimization, or is a clarity-of-code concern) — sehe, Jun 21 '15 at 19:58
my goal is in-order iteration; I do not require in-order storage if that is not necessary to achieve that. but I'm hoping not to have to dump to a vector and then sort manually, if that's what you're getting at... — David Alexander, Jun 21 '15 at 20:43
I expected this to be easier than it looked, and it is. Some of the assumed types weren't quite alright. See my answer :) — sehe, Jun 21 '15 at 22:40

score 7 · Accepted Answer · answered Jun 21 '15 at 22:39

As I expected (from my recollection) the edge lists (both in/out) are already ordered by their targets, hence they are deterministic.

This is especially unambiguous when the VertexContainer selector is vecS because, there, the vertex_descriptor is a simple integral type, that doubles as your vertex_index_t property anyways.

Down The Rabbit Hole

Because I'm not a Boost Graph developer, and hence I donot know the architecture of the BGL types like adjacency_list, I naively started at our top-level entry point:

template <class Config>
inline std::pair<typename Config::in_edge_iterator,
                 typename Config::in_edge_iterator>
in_edges(typename Config::vertex_descriptor u,
         const bidirectional_graph_helper<Config>& g_)
{
  typedef typename Config::graph_type graph_type;
  const graph_type& cg = static_cast<const graph_type&>(g_);
  graph_type& g = const_cast<graph_type&>(cg);
  typedef typename Config::in_edge_iterator in_edge_iterator;
  return
    std::make_pair(in_edge_iterator(in_edge_list(g, u).begin(), u),
                   in_edge_iterator(in_edge_list(g, u).end(), u));
}

Which is instantiated as:

std::pair<typename Config::in_edge_iterator, typename Config::in_edge_iterator>
boost::in_edges(typename Config::vertex_descriptor, const boost::bidirectional_graph_helper<C> &)

with

Config = boost::detail::adj_list_gen<
    boost::adjacency_list<boost::setS, boost::vecS, boost::bidirectionalS, VertexCargo>, boost::vecS, boost::setS,
    boost::bidirectionalS, VertexCargo, boost::no_property, boost::no_property, boost::listS>::config;

typename Config::in_edge_iterator = boost::detail::in_edge_iter<
    std::_Rb_tree_const_iterator<boost::detail::stored_edge_iter<
        long unsigned int, std::_List_iterator<boost::list_edge<long unsigned int, boost::no_property> >,
        boost::no_property> >,
    long unsigned int, boost::detail::edge_desc_impl<boost::bidirectional_tag, long unsigned int>, long int>;

 typename Config::vertex_descriptor = long unsigned int

Filling in

using Config = boost::detail::adj_list_gen<
     boost::adjacency_list<boost::setS, boost::vecS, boost::bidirectionalS, VertexCargo>, boost::vecS, boost::setS,
     boost::bidirectionalS, VertexCargo, boost::no_property, boost::no_property, boost::listS>::config;

Points us the instantiation at adj_list_gen<...>::config and there the iterator is declared as

typedef in_edge_iter<
    InEdgeIter, vertex_descriptor, edge_descriptor, InEdgeIterDiff
> in_edge_iterator;

// leading to
typedef OutEdgeIter InEdgeIter;

// leading to
typedef typename OutEdgeList::iterator OutEdgeIter;

// leading to
typedef typename container_gen<OutEdgeListS, StoredEdge>::type OutEdgeList;

And because the container selector is setS, it will be a std::set of StoredEdge, which is

typedef typename mpl::if_<on_edge_storage,
    stored_edge_property<vertex_descriptor, EdgeProperty>,
    typename mpl::if_<is_edge_ra,
    stored_ra_edge_iter<vertex_descriptor, EdgeContainer, EdgeProperty>,
    stored_edge_iter<vertex_descriptor, EdgeIter, EdgeProperty>
    >::type
>::type StoredEdge;

Which evaluates to

boost::detail::stored_edge_iter<
    long unsigned int,
    std::_List_iterator<boost::list_edge<long unsigned int, boost::no_property> >,
    boost::no_property>

Now this, of course, points to the implementation of the EdgeList...

Now Hold It! Hit The Brakes

But what's most important is the total weak ordering imposed - so we don't go further down that rabbit-hole, but instead shift our attention to stored_edge_iter<>::operator< or similar.

  inline bool operator<(const stored_edge& x) const
    { return m_target < x.get_target(); }

Aha! The ordering is already deterministically defined. You can access it directly using e.g.

for (auto v : make_iterator_range(vertices(g))) {
    std::cout << v <<  " --> ";

    auto const& iel = boost::in_edge_list(g, v);
    for (auto e : iel) std::cout << e.get_target() << " ";
    std::cout << "\n";
}

But you don't need to. Using the generic graph accessors would be pretty much the same:

std::cout << v <<  " --> ";
for (auto e : make_iterator_range(in_edges(v, g)))  std::cout << source(e, g) << " ";
std::cout << "\n";

And you can verify that the collections are properly sorted as expect using e.g.

assert(boost::is_sorted(make_iterator_range(in_edges(v,g))  | transformed(sourcer(g))));
assert(boost::is_sorted(make_iterator_range(out_edges(v,g)) | transformed(targeter(g))));

DEMO

Here's a full demo including all the above and asserting all the expected orderings on a large random-generated graph:

Live On Coliru

#include <iostream>
#include <boost/graph/adjacency_list.hpp>
#include <boost/graph/properties.hpp>
#include <boost/graph/random.hpp>
#include <boost/graph/graph_utility.hpp>
#include <random>

#include <boost/range/adaptors.hpp>
#include <boost/range/algorithm.hpp>
#include <boost/range/algorithm_ext.hpp>

using boost::adaptors::transformed;
using namespace boost;

struct VertexCargo { int Id = rand() % 1024; };

typedef adjacency_list<setS, vecS, bidirectionalS, VertexCargo> Graph;

typedef graph_traits<Graph>::edge_descriptor   ED;
typedef graph_traits<Graph>::vertex_descriptor VD;

struct sourcer {
    using result_type = VD;

    Graph const* g;
    sourcer(Graph const& g) : g(&g) {}
    VD operator()(ED e) const { return boost::source(e, *g); }
};

struct targeter {
    using result_type = VD;

    Graph const* g;
    targeter(Graph const& g) : g(&g) {}
    VD operator()(ED e) const { return boost::target(e, *g); }
};

int main() {

    std::mt19937 rng { std::random_device{}() };

    Graph g;
    generate_random_graph(g, 1ul<<10, 1ul<<13, rng);

    for (auto v : make_iterator_range(vertices(g))) {
        std::cout << v <<  " --> ";

        //auto const& iel = boost::in_edge_list(g, v);
        //for (auto e : iel) std::cout << e.get_target() << " ";

        for (auto e : make_iterator_range(in_edges(v, g)))  std::cout << source(e, g) << " ";
        //for (auto e : make_iterator_range(out_edges(v, g))) std::cout << target(e, g) << " ";

        std::cout << "\n";

        assert(boost::is_sorted(make_iterator_range(in_edges(v,g))  | transformed(sourcer(g))));
        assert(boost::is_sorted(make_iterator_range(out_edges(v,g)) | transformed(targeter(g))));
    }
}

Thanks for going down the rabbit hole! Did you do this using IDE help or just reading all the source? I find the boost source a bit daunting I must admit, and in some cases it defeat my IDE. iRegarding your answer: it seems to be correct---still reading the code... --- but in fact since this order relationship is only a partial order, `in_edges` may remain nondeterministic concerning the order of edges converging on a common target. — David Alexander, Jun 22 '15 at 04:10
Erm. `std::set` by definition doesn't contain elements with equivalent ordering, so your problem doesn't exist :) — sehe, Jun 22 '15 at 06:38
I used gcc + vim + YouCompleMe, by the way. I started off by planting `std::cerr<<__PRETTY_FUNCTION__<< "\n";` into `in_edges()` to get the overload+deduced types. Then it was mainly reading the source, and the `operator<` was located using `YcmCompleter GoToDeclaration` on [the value of foo here](http://paste.ubuntu.com/11755038/). Yes, boost is daunting for me too. Feels more than necessary, sometimes. Oh well :) — sehe, Jun 22 '15 at 06:44
Ah, yes you are correct regarding `std::set`, of course... it only contains one entry per equivalence class... Anyway my mental model for the way edges were stored was incorrect, hence my confusion here. Thanks for drawing my attention to `stored_edge`. The comparison is unambiguous. Let me see where my problem now lies.... — David Alexander, Jun 22 '15 at 14:51
Hi @sehe, thanks to your help I think I have identified the problem. In fact in my codebase it turns out my vector list selector is `listS` not `vecS`, and in this case the comparison you identified is dependent on in-memory addresses. Frankly I am a bit surprised this is so; perhaps it would be more correct for the BGL implementers to compare on `vertex_index`? — David Alexander, Jun 22 '15 at 18:49
It could be "as correct" and incur costs (ordering of EdgeLists is not specified because it isn't required for BGL). All other places with non-integral vertex-descriptors only supply vertex_index when one is needed. Requiring it internally to sort the edges, would incur the overhead of updating it for every mutation operation. Also, the vertex index would be `const` because any change would silently invalidate all the EdgeLists that contained the related vertex. — sehe, Jun 22 '15 at 22:14
So, in fact if you need it, it might be time to adapt your own Graph model. As a long shot you could look at using `stable_vector` (I haven't tried whether `adjacency_list` works with that as the VertexContainer, but it _would_ give stability of iterators and integral descriptors iff BGL picks up on the random-access nature of the iterators) — sehe, Jun 22 '15 at 22:15

boost graph library: deterministic order of iteration of in_edges?

Note on the answer

1 Answers1

Down The Rabbit Hole

Now Hold It! Hit The Brakes

DEMO

Linked