1

I have a data structure that looks like a graph of rather isolated undirected "subgraphs" and would like to get those subgraphs that have less than N edges, in this case only those with one or two edges. I'm a bit lost here and am not sure how to go with it. Actualy, the way I stored the data might not even be the best approach for this problem.

As an example, for graph below I would like to retrieve the nodes A-B, H-J and K-M, but not C-G because that has more than 2 edges.

graph = {  # using sets
    # single (A-B)
    'A': {'B'},
    'B': {'A'},
    # complex (C-G)
    'C': {'D'},
    'D': {'C', 'E'},
    'E': {'D', 'F'},
    'F': {'G', 'E'},
    'G': {'F'},
    # digraph1 (H-J)
    'H': {'J', 'I'},
    'I': {'H'},
    'J': {'H'},
    # digraph2 (K-M)
    'K': {'L'},
    'L': {'K', 'M'},
    'M': {'L'},
}

Any ideas or suggestions on how to tackle this problem in Python?

PedroA
  • 1,803
  • 4
  • 27
  • 50

3 Answers3

1

You follow the standard algorithm for closure of a graph node:

Build a new structure of your convenience.

Pick a starting node however you like; you'll eventually get to all of them.  Put it on the "not visited" list.

While the "not visited" list is not empty:
    Remove the top node from the list.
    Add it to the structure.
    Iterate through its edges.
        If the node on the other end isn't already in the structure,
            add the node and edge to the structure.

Now, just check how many edges are in your structure. If it's less than 3, you have a desired graph.

Remove those nodes from your dictionary. Continue with new starting points until your dictionary is empty. You have now identified all small graphs.

Prune
  • 76,765
  • 14
  • 60
  • 81
  • Sorry, I'm very new to the graph dialect... When you say "Iterate through its edges.", would that mean to iterate over each element of the set for that node? – PedroA Mar 27 '18 at 22:04
1

I'm not sure how much of a general solution you need, so here is a very specific one, using networkx package.

First initialize your graph:

import networkx as nx
edges = [('a','b'),('c','d'),('d','e'),('e','f'),('f','g'),('h','j'),('i','h'),('k','l'),('l','m')]
gr = nx.Graph(edges)

Then find all connected components and iterate over them:

connected = nx.algorithms.connected_components(gr)
for comp in connected:
    if len(comp) <= 3:
        print(comp)

Et voila! (Obviously you do not have to print them, you can do whatver you want with them).

jjj
  • 1,067
  • 1
  • 15
  • 30
  • that seems to give what I was looking for. I'll look into the networkx docs but do you know if there's any way to initialize the graph from a dictionary like the one I already have? – PedroA Mar 27 '18 at 22:29
  • This seems to only count the nodes, not the edges. So it doesn't work when a component has three nodes AND three edges. For example: `('h','j'),('i','h'),('j', 'i')` – Mark Mar 27 '18 at 22:32
  • @Mark_M thanks for pointing this out. I guess you could add a condition for checking the number of edges for a given node in the original graph if it is in one of the connected components. But there is no point now, is there? ;) – jjj Mar 28 '18 at 08:31
  • @jjj I only noticed it because that I did the same thing. – Mark Mar 28 '18 at 08:35
  • @PedroA I'm not sure but I don't think so. In networkx you either add a node (or an edge) at a time or you add them from an iterable, but in the case of edges it has to be an [iterable of tuples](https://networkx.github.io/documentation/stable/tutorial.html#edges). – jjj Mar 28 '18 at 08:38
1

You can do a simple breadth-first search over each node which will identify your subcomponents. You can also count the edges when you do this and return that with the subcomponents. For example given your graph structure you could do something like:

from collections import deque

def bfs(graph, start_node):
    searched = set([start_node])
    edge_count = 0
    queue = deque(start_node)
    while(len(queue)):
        current = queue.popleft()
        for node in graph[current]:
            edge_count += 1                
            if node not in searched:
                searched.add(node)
                queue.append(node)
    return (searched, edge_count/2)

def findSubGraphs(graph):
    subgraphs = []
    found = set()
    for node in graph:
        if node not in found:
            (subgraph, edge_count) = bfs(graph, node)
            found.update(subgraph)
            subgraphs.append((subgraph, edge_count))
    return subgraphs

findSubGraphs(graph)

This should return a data structure with your subgraph's nodes and the edge counts. For example:

[({'A', 'B'}, 1.0),
 ({'C', 'D', 'E', 'F', 'G'}, 4.0),
 ({'H', 'I', 'J'}, 2.0),
 ({'K', 'L', 'M'}, 2.0)]
Mark
  • 90,562
  • 7
  • 108
  • 148
  • 1
    Knowing the number of edges in the subgraphs is actually a great plus for I want to do next. – PedroA Mar 27 '18 at 22:46