This is going to be a long story, but maybe some of you would like to study this case.
I am working on parallel graph algorithm development. I've chosen a cutting-edge, HPC parallel graph data structure named STINGER. STINGER's mission statement reads:
"STINGER should provide a common abstract data structure such that the large graph community can quickly leverage each others' research developments. [...] Algorithms written for STINGER can easily be translated/ported between multiple languages and frameworks [...] It is recognized that no single data structure is optimal for every graph algorithm. The objective of STINGER is to configure a sensible data structure that can run most algorithms well. There should be no significant performance reduction for using STINGER when compared with another general data structure across a broad set of typical graph algorithms."
STINGER is probably rather efficient and suitable for shared-memory parallelism. On the other hand, it is not very abstract, general-purpose, or concise. The interface which STINGER provides is unsatisfactory for me for several reasons: It is too verbose (functions require parameters which are not important for my case); It models only a directed graph whereas I need an undirected one; and other reasons.
However, I shy away from implementing a new parallel graph data structure on my own.
So I've already started to encapsulate a STINGER instance with my own Graph
class. For example, to check whether an undirected edge exists, I can now call Graph::hasEdge(node u, node v)
instead of writing into my algorithms:
int to = stinger_has_typed_successor(stinger, etype, u, v);
int back = stinger_has_typed_successor(stinger, etype, v, u);
bool answer = to && back;
So far, this has worked well. Now to the topic of iteration.
STINGER realizes traversal (iteration over nodes, edges, incident edges of a node etc.) via macros. For example, you write
STINGER_PARALLEL_FORALL_EDGES_BEGIN(G.asSTINGER(), etype) {
node u = STINGER_EDGE_SOURCE;
node v = STINGER_EDGE_DEST;
std::printf("found edge (%d, %d)", u, v);
} STINGER_PARALLEL_FORALL_EDGES_END();
Here STINGER_PARALLEL_FORALL_EDGES_BEGIN
expands to
do { \
\
\
for(uint64_t p__ = 0; p__ < (G.asSTINGER())->ETA[(etype)].high; p__++) { \
struct stinger_eb * current_eb__ = ebpool + (G.asSTINGER())->ETA[(etype)].blocks[p__]; \
int64_t source__ = current_eb__->vertexID; \
int64_t type__ = current_eb__->etype; \
for(uint64_t i__ = 0; i__ < stinger_eb_high(current_eb__); i__++) { \
if(!stinger_eb_is_blank(current_eb__, i__)) { \
struct stinger_edge * current_edge__ = current_eb__->edges + i__;
The macro hides the intestines of the data structure which apparently need to be completely exposed for efficient (parallel) iteration. There are macros for various combinations, including STINGER_FORALL_EDGES_BEGIN
, STINGER_READ_ONLY_FORALL_EDGES_BEGIN
, STINGER_READ_ONLY_PARALLEL_FORALL_EDGES_BEGIN
...
Yes I could use these macros, but I wonder if there is a more elegant way to implement iteration. If I could wish for an interface, it would look similar to
G.forallEdges(readonly=true, parallel=true, {..})
GraphIterTools.forallEdges(G, readonly=true, parallel=true, {...})
where {...}
is simply a function, a closure or a "block of code", which would then be executed appropriately. However, I lack the C++ experience to implement this. I wonder what advice you can give me on this issue. Maybe also "You should go with the macros because...".