0

Dears,

I have a graph with 3 million nodes, consisting of a lot of subgraphs with 2/3/4 nodes and some up to 8000/9000 nodes. My desire is to split this large graph into subgraphs each having a maximum of 5 nodes.

For each subgraph that already fulfills this condition - I leave it as it is. For each that has more than 5 nodes I find the edge with smallest weight and remove it (the graph is weighted).

However I think my implementation is a complete disaster and the runtime will be longer than my life.

def return_ab(dc):
    #Expects, return the key with the minimum value
    mn = min(list(dc.values()))
    #print(mn)
    return list(dc.keys())[list(dc.values()).index(mn)]

ls_gr_1 = []
rem_edges_1 = []
def prune(grph):
    if grph.size() <= 5:
        ls_gr_1.append(grph)
    
    else:
        ls = nx.get_edge_attributes(grph, 'DURATION')
        min = return_ab(ls)
        #print(min)
        unfrozen_graph = nx.Graph(grph)
        #print(min)
        rem_edges_1.append(min)
        unfrozen_graph.remove_edge(*min)
        
        for c in nx.connected_components(unfrozen_graph):
            
            prune(unfrozen_graph.subgraph(c))
         

I would appreciate a point toward how to achieve this in a more elegant and faster way...

EDIT:

I tried through iteratian and that seems a tad bit faster but is still way too slow:


def traverse(graph):
    components = [graph.subgraph(c).copy() for c in nx.connected_components(graph)]
    large_ones = components
    
    small_ones = []
    
    while large_ones:
        graph_to_work = large_ones.pop()
        if len(graph_to_work) <= 5:
            small_ones.append(graph_to_work)
        else:
            ls = nx.get_edge_attributes(graph_to_work, 'DURATION')
            min = return_ab(ls)
            unfrozen_graph = nx.Graph(graph_to_work)
            unfrozen_graph.remove_edge(*min)
            new_components = [unfrozen_graph.subgraph(c).copy() for c in nx.connected_components(unfrozen_graph)]
            large_ones.extend(new_components)
            
    return small_ones
hristogg
  • 11
  • 2

1 Answers1

0

To get the subgraphs with 5 or more nodes try this:

min_net_size=5
components=list(nx.connected_components(G))

for component in components:
    if len(component) < min_net_size:  # remove small networks
        for node in component:
            G.remove_node(node)

I use this and its fast to get the components. You can then work on them as you need.

For example to compute information for of all the larger components:

components=list(nx.connected_components(G))#reevauate now removed smaller ones so can get a value for n which is used to select the largest n components
n = min(250, len(components))

largest_components = sorted(nx.connected_components(G), key=len, reverse=True)[:n]


for index in range(n):
    print('Component no. ',index)
    component=G.subgraph(largest_components[index])

    if len(component.nodes)<1000: #ignore really big ones!
        spring_3D = nx.spring_layout(component, dim=3, k=0.75,seed=42)  
        edges = component.edges()
        nodes=component.nodes()
#Progress from here with whatever you need to do
A Rob4
  • 1,278
  • 3
  • 17
  • 35
  • Thanks A Rob4, the first part is pretty clear, but then what I actually want to do with the subgraphs with more than 5 nodes is to split break them into subgraphs based on my criteria (smallest edge to be removed) and do this over and over again until they all have less than 5 components which I am not sure I understand how to do. – hristogg Jul 01 '21 at 18:16
  • you should be able to get the edge weights quite easily (see https://stackoverflow.com/questions/40128692/networkx-how-to-add-weights-to-an-existing-g-edges) this will give a dictionary and then select the edges with the weights above the threshold. I think! – A Rob4 Jul 02 '21 at 08:43