4

I am following networkx documentation (1) and I would like to set different penalties for cost function (e.g. node_del_cost and node_ins_cost). Let say, I would like to penalize deletion/insertion of node by three points.

So far, I have created two undirected graphs that differ by labeling node C (UPDATED CODE).

import networkx as nx

G=nx.Graph()
G.add_nodes_from([("A", {'label':'CDKN1A'}), ("B", {'label':'CUL4A'}), 
    ("C", {'label':'RB1'})])

G.add_edges_from([("A","B"), ("A","C")])

H=nx.Graph()
H.add_nodes_from([("A", {'label':'CDKN1A'}), ("B", {'label':'CUL4A'}),
    ("C", {'label':'AKT'})])
H.add_edges_from([("A","B"), ("A","C")])

# arguments
# node_match – a function that returns True if node n1 in G1 and n2 in G2 should be considered equal during matching.
# ignored if node_subst_cost is specified
def node_match(node1, node2):
    return node1['label']==node2['label']

# node_subst_cost - a function that returns the costs of node substitution
# overrides node_match if specified.
def node_subst_cost(node1, node2): 
    return node1['label']==node2['label']

# node_del_cost - a function that returns the costs of node deletion
# if node_del_cost is not specified then default node deletion cost of 1 is used.
def node_del_cost(node1):
    return node1['label']==3    

# node_ins_cost - a function that returns the costs of node insertion
# if node_ins_cost is not specified then default node insertion cost of 1 is used.
def node_ins_cost(node2):
    return node2['label']==3    

paths, cost = nx.optimal_edit_paths(G, H, node_match=None, edge_match=None, 
    node_subst_cost=node_subst_cost, node_del_cost=node_del_cost, node_ins_cost=node_ins_cost, 
    edge_subst_cost=None, edge_del_cost=None, edge_ins_cost=None, 
    upper_bound=None)

# length of the path
print(len(paths))

# optimal edit path cost (graph edit distance).
print(cost)

This give me 2.0 as an optimal path cost and 7.0 as the length of the path. However, I do not fully understand why, because I set penalty to 3.0, so the edit distance is expected to be 3.

Thank you for your suggestions!

Olha

Olha Kholod
  • 539
  • 1
  • 5
  • 11
  • 1
    Your error at the bottom is from `node_del_cost` and `node_ins_cost` parameters, the documentation states that the inputs for this paramter are 'callables'. 2 is an int and ints are callables. You need some sort of function for these parameters. – Scott Boston Sep 16 '20 at 20:57
  • I figure out that len(paths) means number of different paths in form of node_tuple and edge_tuple. But I still don't understand how to set a custom cost function. – Olha Kholod Sep 17 '20 at 13:56
  • another observation is that if I specify 'node_match' and 'edge_match' options, unmatched labels from nodes and unmatched edges would be penalized by 1. – Olha Kholod Sep 17 '20 at 16:39
  • 1
    @OlhaKholod also consider that the cost functions should return a `float` or `int`, here they are returning `bool`. – Azim Mazinani Sep 20 '20 at 17:18

1 Answers1

5

As mentioned in the documentation, when you pass a node_subst_cost function as a parameter, it ignores node_match function and applies cost for any substitution operation, even though the nodes are equal. So I would suggest that first you evaluate the nodes equality in node_subst_cost function and then apply the cost accordingly:

def node_subst_cost(node1, node2):
    # check if the nodes are equal, if yes then apply no cost, else apply 3
    if node1['label'] == node2['label']:
        return 0
    return 3


def node_del_cost(node):
    return 3  # here you apply the cost for node deletion


def node_ins_cost(node):
    return 3  # here you apply the cost for node insertion


paths, cost = nx.optimal_edit_paths(
    G,
    H,
    node_subst_cost=node_subst_cost,
    node_del_cost=node_del_cost,
    node_ins_cost=node_ins_cost
)

print(cost)  # which will return 3.0

You can also do the same for edge operations.

Azim Mazinani
  • 705
  • 6
  • 11