6

EDIT: As pointed out by @sehe, the error lies somewhere before the betweenness centrality calculation. Move along!



I implemented a minimal program to compute the betweenness centrality of an undirected graph, in both Python and C++. Surprisingly, the networkx (Python) version far outperforms the boost::graph (C++) implementation, even when one accounts for loading overhead, etc. Am I doing something completely inefficient?

The gist of the Python code is simply

# load graph and start chrono
clist = nx.betweenness_centrality(g)
# output

and for C++ we have

typedef boost::adjacency_list<boost::vecS,
                              boost::vecS,
                              boost::undirectedS> Graph;

typedef boost::property_map< Graph, boost::vertex_index_t>::type VertexIndexMap;

int main() {
    Graph g;

    // ... 
    // load graph
    // ...

    VertexIndexMap v_index = get(boost::vertex_index, g);
    std::vector< double > vertex_property_vec(boost::num_vertices(g), 0.0);
    boost::iterator_property_map< std::vector< double >::iterator, VertexIndexMap >
          vertex_property_map(vertex_property_vec.begin(), v_index);


    boost::brandes_betweenness_centrality(g, vertex_property_map);

    // Output ...
    return 0;
}

Note that both libraries seem to implement the exact same algorithm (Brandes 2001).

jgyou
  • 473
  • 1
  • 8
  • 19
  • 1
    How do you compile? How do you measure? How *far* outperforms? – Ivan Aksamentov - Drop Jan 27 '16 at 05:32
  • I'm not at all sure, but it's possible both are doing different things. According to [this](https://networkx.github.io/documentation/latest/reference/generated/networkx.algorithms.centrality.betweenness_centrality.html#betweenness-centrality) the python version has an optional parameter k that "If k is not None use k node samples to estimate betweenness. The value of k <= n where n is the number of nodes in the graph. Higher values give better approximation.". Can you try `clist = nx.betweenness_centrality(g,num_vertices)` and see if they are comparable? (You should also check the results). – llonesmiz Jan 27 '16 at 05:39
  • @Drop I compile with a simple `g++ -std=c++0x -o3 -o bc` on gcc-5.3 with boost 1.60.2. When I say "far outperforms", I mean it takes ~10s to compute the BC distribution of 90 small (n<200) networks with networkx, and 50 minutes and counting with boost::graph. Clearly something is *wrong*. – jgyou Jan 27 '16 at 06:01
  • @cv_and_he Thanks for the pointer. The problem is not with nx, which properly implement the method (an inspection of the source reveals that `k=None` defaults to an exact calculation). However, I inspected the output of the C++ code and it appears that the output of brandes_between_centrality does not even get stored in the `vertex_property_vec`... – jgyou Jan 27 '16 at 06:15

1 Answers1

9

I can't reproduce the issue. There might be something in the code you didn't show?

Here's hoping my benchmark helps, it runs 90 random graphs of 2000 nodes and 4000 edges in a total of 20 seconds on my PC.

Notes:

  • I calculate and show some arbitrary statistics on the result just so we know the compiler cannot optimize things away
  • Downscaled to 200 nodes/400 edges for Coliru, runs online in 0.6s

Live On Coliru

#include <boost/graph/adjacency_list.hpp>
#include <boost/graph/betweenness_centrality.hpp>
#include <boost/graph/random.hpp>
#include <boost/accumulators/accumulators.hpp>
#include <boost/accumulators/statistics.hpp>
#include <random>

typedef boost::adjacency_list<boost::vecS,
                              boost::vecS,
                              boost::undirectedS> Graph;

typedef boost::property_map<Graph, boost::vertex_index_t>::type VertexIndexMap;

int main() {
    for (int i = 0; i < 90; ++i) {
        auto const seed = std::random_device{}();

        Graph g;
        {
            std::mt19937 prng { seed };
            boost::generate_random_graph(g, 200, 400, prng);
        }

        std::vector<double> centrality(boost::num_vertices(g), 0.0);

        {
            VertexIndexMap v_index = get(boost::vertex_index, g);
            boost::iterator_property_map<std::vector<double>::iterator, VertexIndexMap>
                vertex_property_map = make_iterator_property_map(centrality.begin(), v_index);

            boost::brandes_betweenness_centrality(g, vertex_property_map);
        }

        {
            namespace ba = boost::accumulators;
            namespace bt = ba::tag;
            ba::accumulator_set<double, ba::features<bt::mean, bt::variance> > acc;

            std::for_each(centrality.begin(), centrality.end(), std::ref(acc));

            std::cout << "seed:" << seed << "\t" << "mean:" << ba::mean(acc) << "\t" << "stddev:" << ba::variance(acc) << "\n";
        }
    }
}

Output from my machine (2000 nodes/4000 edges):

seed:1750802922 mean:4376.07    stddev:1.81521e+07
seed:2035487211 mean:4453.37    stddev:1.86408e+07
seed:2157083839 mean:4431.28    stddev:1.76926e+07
seed:877099895  mean:4397.33    stddev:1.77377e+07
seed:3204597236 mean:4437.76    stddev:1.76055e+07
seed:1683789044 mean:4366.15    stddev:1.79065e+07
seed:205823178  mean:4382.23    stddev:1.97325e+07
seed:835182347  mean:4437.69    stddev:1.99322e+07
seed:783544360  mean:4419.82    stddev:1.99628e+07
seed:1294214099 mean:4450.26    stddev:1.86657e+07
seed:133119335  mean:4474.56    stddev:1.80184e+07
seed:2431619152 mean:4398.11    stddev:1.8606e+07
seed:1846518108 mean:4487.64    stddev:1.82487e+07
seed:3215400061 mean:4487.08    stddev:1.89737e+07
seed:4195971142 mean:4366.36    stddev:1.83186e+07
seed:2877690475 mean:4387.66    stddev:1.67049e+07
seed:377384221  mean:4447.82    stddev:1.88145e+07
seed:4271065968 mean:4397.8 stddev:1.90055e+07
seed:2344426096 mean:4439.01    stddev:1.67352e+07
seed:3089481099 mean:4392.55    stddev:1.85857e+07
seed:2366154376 mean:4424.22    stddev:1.8114e+07
seed:609566395  mean:4412.17    stddev:1.83808e+07
seed:532359230  mean:4385.37    stddev:1.90363e+07
seed:1222481049 mean:4389.03    stddev:1.8123e+07
seed:4252784567 mean:4424.44    stddev:1.97951e+07
seed:3589086722 mean:4441.63    stddev:1.89086e+07
seed:3253153938 mean:4434.16    stddev:1.83747e+07
seed:3171332867 mean:4425.64    stddev:1.88349e+07
seed:1628933501 mean:4389.3 stddev:1.77686e+07
seed:2757066761 mean:4456.54    stddev:1.86788e+07
seed:253689423  mean:4457.74    stddev:1.88101e+07
seed:1044077369 mean:4437.64    stddev:1.94368e+07
seed:2010288733 mean:4335.93    stddev:1.96337e+07
seed:2827445098 mean:4404.35    stddev:1.72173e+07
seed:2983615584 mean:4451.17    stddev:1.87881e+07
seed:3263411780 mean:4352.61    stddev:1.84145e+07
seed:209486011  mean:4388.81    stddev:2.00036e+07
seed:914410356  mean:4394.58    stddev:1.8876e+07
seed:4179887676 mean:4458.33    stddev:1.79864e+07
seed:1672110941 mean:4527.26    stddev:1.88183e+07
seed:1180712876 mean:4410.68    stddev:1.77379e+07
seed:3297971268 mean:4314.57    stddev:1.76706e+07
seed:1888708924 mean:4432.68    stddev:1.81473e+07
seed:519304960  mean:4346.21    stddev:1.96675e+07
seed:989700613  mean:4404.25    stddev:1.89632e+07
seed:3290422387 mean:4424.13    stddev:1.82944e+07
seed:1248119514 mean:4449.89    stddev:1.94721e+07
seed:2609686267 mean:4495.49    stddev:1.97461e+07
seed:2169392337 mean:4506.17    stddev:1.67787e+07
seed:222259970  mean:4525.36    stddev:1.94983e+07
seed:2302951742 mean:4449.87    stddev:1.86658e+07
seed:803085249  mean:4434.22    stddev:1.90194e+07
seed:291896941  mean:4388.42    stddev:1.92467e+07
seed:3271497352 mean:4401.03    stddev:1.98458e+07
seed:119293674  mean:4441.89    stddev:1.87025e+07
seed:2067901961 mean:4444.3 stddev:1.91092e+07
seed:884669150  mean:4370   stddev:1.77506e+07
seed:2010782469 mean:4427.87    stddev:1.9524e+07
seed:2999945815 mean:4341.03    stddev:1.93057e+07
seed:1413596477 mean:4429.33    stddev:1.88379e+07
seed:2999144075 mean:4346.83    stddev:1.83441e+07
seed:52996326   mean:4479.39    stddev:1.90295e+07
seed:846523521  mean:4476.82    stddev:1.80105e+07
seed:2665690159 mean:4399.54    stddev:1.92723e+07
seed:1290757175 mean:4373.11    stddev:1.80565e+07
seed:4174263463 mean:4382.66    stddev:1.89344e+07
seed:2416968118 mean:4474.83    stddev:1.91461e+07
seed:1137975099 mean:4406.52    stddev:1.89247e+07
seed:1776900404 mean:4443.7 stddev:1.91418e+07
seed:898128099  mean:4466.54    stddev:1.87237e+07
seed:3604582552 mean:4379.01    stddev:1.73953e+07
seed:3268788789 mean:4418.29    stddev:1.83793e+07
seed:910639960  mean:4507   stddev:1.73813e+07
seed:1878704662 mean:4400.72    stddev:1.81355e+07
seed:2667792405 mean:4462.01    stddev:1.81283e+07
seed:2492001126 mean:4403.49    stddev:1.86103e+07
seed:3485479239 mean:4389.32    stddev:1.84779e+07
seed:3202616710 mean:4539.88    stddev:1.94982e+07
seed:2878361287 mean:4454.12    stddev:1.86021e+07
seed:1196553996 mean:4419.3 stddev:1.86354e+07
seed:3641446403 mean:4451.88    stddev:1.857e+07
seed:2801960787 mean:4469.33    stddev:1.89828e+07
seed:1938419870 mean:4462.89    stddev:1.82868e+07
seed:176826289  mean:4464.34    stddev:1.76994e+07
seed:2873298171 mean:4415.06    stddev:1.87784e+07
seed:2992765364 mean:4395.07    stddev:1.88507e+07
seed:2883991750 mean:4422.02    stddev:1.87585e+07
seed:2985503953 mean:4479.98    stddev:1.91894e+07
seed:3822049160 mean:4439.22    stddev:1.80799e+07
seed:2881148075 mean:4341.83    stddev:1.82921e+07
sehe
  • 374,641
  • 47
  • 450
  • 633
  • You're implementation is spot on and you guess is right, the graph wasn't loaded properly (error in the rest of the code). Wondering what is the etiquette in these cases, should I just delete the question? – jgyou Jan 27 '16 at 19:37
  • 2
    Nah. The answer helped you - we just won't see many upvotes. It's still handy code and correctly tagged, in case anyone ever searched SO for a sample showing betweenness-centrality in C++ (besides random generated graphs and statistical accumulators) – sehe Jan 27 '16 at 19:42
  • I was searching SO for a comparison between boost and networkx. How fast did the python/networkx code run on your machine? – David Schumann May 04 '17 at 08:40
  • @DavidNathan if you have python/networkx code available, I could try to run a comparison – sehe May 04 '17 at 09:20
  • I think `import networkx as nx` `g = nx.gnm_random_graph(100,200)` `test = nx.betweenness_centrality(g)` would suffice – David Schumann May 04 '17 at 09:49
  • Have you had the chance to run a comparison? – David Schumann Jan 25 '18 at 12:00
  • 1
    @DavidNathan I just did: https://imgur.com/a/wZkeP The c++ is (unsurprisingly) orders of magnitude faster, even when doing the extra statistics on the result. The timing mentioned in my answer was for 90 runs of V=2000,E=4000 graphs – sehe Jan 25 '18 at 13:01