The tool gvpr which is part of the graphviz tools allows to apply rules to a graph and output the modified graph.
From the description:
It copies input graphs to its output, possibly transforming their
structure and attributes, creating new graphs, ...
It looks like you want to remove all nodes having an indegree of 0 and having only linked nodes (successors) with an outdegree of 0.
Here's my version of a gvpr
script nostraynodes.gv
:
BEGIN {node_t n; int candidates[]; int keepers[];}
E{
if (tail.indegree == 0 && head.outdegree == 0)
{
candidates[tail] = 1;
candidates[head] = 1;
}
else if (tail.indegree == 0)
{
keepers[tail] = 1;
}
else if (head.outdegree == 0)
{
keepers[head] = 1;
}
}
END_G {
for (candidates[n]){
if (n in keepers == 0)
{
delete(NULL, n);
}
}
}
Here's what the script does:
Loop through all edges one time and populate two lists:
- candidates - a list of nodes which may have to be removed, and
- keepers - a list of nodes which may end up in candidates but should not be removed.
So what gets added to which list?
- Any two nodes connected to each other, where the tail node does not have any incoming edges and the head node does not have any outgoing edges, form a chain of only 2 nodes and are therefore candidates to be deleted; that is, unless the same nodes are part of an other chain longer than 2 nodes:
- A tail node without any incoming edges, but connected to a head node which itself has outgoing edges, is a keeper; and
- A head node without any outgoing edges, but connected to a tail node which itself has incoming edges, is also a keeper.
- Delete all candidates not in keepers
This solution is not generic and only works for the problem stated in the question, that is keeping only chains at least 3 nodes long. It also won't delete short loops (two nodes connected to each other).
You can call this using the following line:
gvpr -c -f .\nostraynodes.gv .\graph.dot
The output using your sample graph is:
digraph g {
1 -> 2;
2 -> 3;
3 -> 4;
}
Please note that this is my first gvpr
script - there are probably better ways to write this, and I'm not sure how this handles 35000 nodes, though I'm confident this should not be a big deal.
See also Graphviz/Dot - how to mark all leaves in a tree with a distinctive color? for a simpler example of graph transformation.