2

I'm currently studying my college special topic. My problem is i can remove all the node i don't want but i want to keep some specific node. Here's how i do it.

1.read gml into networkx

2.using this code to remove the website that i don't want and then write it into a new gml file

import networkx as nx
G = nx.read_gml('test.gml')
for i in range(2000):
    for node in G.nodes:
        if "pu.edu.tw" not in node:
            G.remove_node(node)
            break
nx.write_gml(G,"finaltest.gml")

3.As you can see the part of this gml file, i successfully keep all 'pu.edu.tw' website

graph [
directed 1
multigraph 1
node [
  id 0
  label "https://www.pu.edu.tw/"
]
node [
  id 1
  label "https://alumni.pu.edu.tw/"
]
node [
  id 2
  label "https://freshman.pu.edu.tw/"
]
node [
  id 3
  label "https://tdc.pu.edu.tw/"
]
node [
  id 4
  label "https://alcat.pu.edu.tw/"
]
node [
  id 5
  label "https://www.secretary.pu.edu.tw/"
]
node [
  id 6
  label "https://pugive.pu.edu.tw/"
]

4.The problem is when i try to draw this gml file with networkx, i got some nodes without egdes enter image description here

5.And i found out the reason is that I deleted the link related to 'pu.edu.tw' so there are some egdes missing

I want to know how to not only remove the website i don't want and keep specific node that related to 'pu.edu.tw' so that edges won't missing. or some way to reconnect node. Thank you.

---------------------------------------------------------------------------------

update a new question .... What if i want to add multiple condition, such as

def cleanup(g):
    g_aux = g.to_undirected()
        for node in g_aux.nodes:
            if ("tku.edu.tw"or"scu.edu.tw"or"cycu.edu.tw"or"fcu.edu.tw") not in node:
            for neighbor in g_aux.neighbors(node):
                if "tku.edu.tw"or"scu.edu.tw"or"cycu.edu.tw"or"fcu.edu.tw" in neighbor:
                    break
            else:
                g.remove_node(node)

is this the right way to do?

KaiHung
  • 23
  • 5
  • 1
    So, this orphans belong to 'pu.edu.tw', but were connected only to nodes that are not from this website, and you want to create edges that replace such indirect connections? Say, there were nodes 'A.pu.edu.tw' -- B.oth -- C.oth -- D.pu.edu.tw, then you want to add edge A -- C? – tyrrr Dec 01 '20 at 13:39
  • Yes! this is what i'm trying to do right now. – KaiHung Dec 02 '20 at 00:52
  • Yes that is the way to do it if you want the nodes to have any of those ids – willcrack Dec 22 '20 at 11:02

3 Answers3

2

If you only want to maintain one connection from each orphan node, the one with "closest" node from your subgraph, you may do the following: after creating subgraph, iterate over orphan nodes and for each of them perform BFS algorithm on the original graph, stopping when you find node that has label 'pw.edu.tw' and adding a new edge from this node to the orphan node in subgraph. With BFS you are guaranteed to find the closest node with desired property.

The following code should do the trick:

import networkx as nx
from networkx.algorithms.traversal.breadth_first_search import bfs_edges

G = nx.read_gml('test.gml')

desired_nodes = [node for node in G.nodes if 'pu.edu.tw' in node]
subgraph = nx.Graph(G.subgraph(desired_nodes))

orphan_nodes = [node for node in subgraph.nodes if 
subgraph.degree[node] == 0]

for orphan in orphan_nodes:
    for _, neigh in bfs_edges(G, orphan):
        if 'pu.edu.tw' in neigh:
            subgraph.add_edge(neigh, orphan)
            break
            
nx.write_gml(subgraph,"finaltest.gml")

I've also changed the method of removing nodes from graph - instead of the double loop you've implemented, I first find nodes with desired property with list comprahension, and then leverage the subgraph method from networkx.Graph - it's cleaner and will work for arbitrary number of removed nodes (in opposite to the loops, which you are probably aware of). This way a new graph object is created, instead of removing edges from an old one - which is necessary for algorithm presented above.

willcrack
  • 1,794
  • 11
  • 20
tyrrr
  • 528
  • 2
  • 11
  • 1
    name `'neigh'` is not defined – willcrack Dec 02 '20 at 15:27
  • Also `G.subgraph(desired_nodes)` would return a message that you can't change a frozen graph, this is probably because they don't want you to take the risk of screwing a part of the original graph. Instead it should be `nx.Graph(G.subgraph(desired_nodes))`. Np ;) – willcrack Dec 02 '20 at 16:46
1

One thing you can do is to keep every node whose neighbor has "pu.edu.tw" in it's name.

Here's the full code:

import networkx as nx

def cleanup(g):
    g_aux = g.to_undirected()
    for node in g_aux.nodes:
        if "pu.edu.tw" not in node:
            for neighbor in g_aux.neighbors(node):
                if "pu.edu.tw" in neighbor:
                    # Found
                    break
            else:
                # Didn't find pu.edu.tw in any neighbors
                g.remove_node(node)

G = nx.read_gml('test.gml')
cleanup(G)
nx.write_gml(G,"finaltest.gml")

The result obtained is every node with "pu.edu.tw" and it's neighbors.
Please note that I used an undirected version of the graph, g_aux = g.to_undirected(), keeping every neighbor of a "pu.edu.tw" independently of the direction of the connecting edge.

Here is some code to check if any pu.edu.tw doesn’t have any neighbors:

def check_isolated(g):
    for node in g.nodes:
        if "pu.edu.tw" in node:
            if g.degree[node] == 0:
                print(node)

If this outputs anything before running cleanup then those nodes will always be isolated.

print(“before”)
check_isolated(g)
print(“cleaning...”)
cleanup(g)
print(“after”)
check_isolated(g)
willcrack
  • 1,794
  • 11
  • 20
0

Here is a question i want to ask....

g_aux = g.to_undirected() 

why do i have to use g_aux to run this program? i don't understand what graph auxiliary really do in NetworkX.

KaiHung
  • 23
  • 5