Community detection with (nested and non-nested) stochastic block model for (weighted) bipartite networks using python package graph-tool

Question

I'm new to Python and I would like to use the package graph-tool to estimate the optimal number of communities in my network using the stochastic block model (nested and non-nested) approach.

I read the documentation related to the core functions "Graph" (to create a graph) and then "minimize_blockmodel_dl" and "minimize_nested_blockmodel_dl" to finally have what I need, but I couldn't find anything specific for bipartite networks.

It seems that function Graph doesn't allow one to create a bipartite graph, but that would be strange...

For that reason, I just saw how to create one using the networkx package and then transform it into a Graph object using the functions that I found online:

#

def get_prop_type(value, key=None):
# Deal with the value
if isinstance(value, bool):
    tname = 'bool'
elif isinstance(value, int):
    tname = 'float'
    value = float(value)
elif isinstance(value, float):
    tname = 'float'
elif isinstance(value, str):
    tname = 'string'
elif isinstance(value, dict):
    tname = 'object'
else:
    tname = 'string'
    value = str(value)
return tname, value, key

#

def nx2gt(nxG):
# Phase 0: Create a directed or undirected graph-tool Graph
gtG = Graph(directed=False)
# Add the Graph properties as "internal properties"
for key, value in nxG.graph.items():
    # Convert the value and key into a type for graph-tool
    tname, value, key = get_prop_type(value, key)
    prop = gtG.new_graph_property(tname) # Create the PropertyMap
    gtG.graph_properties[key] = prop     # Set the PropertyMap
    gtG.graph_properties[key] = value    # Set the actual value
# Phase 1: Add the vertex and edge property maps
# Go through all nodes and edges and add seen properties
# Add the node properties first
nprops = set() # cache keys to only add properties once
for node, data in nxG.nodes(data=True):
    # Go through all the properties if not seen and add them.
    for key, val in data.items():
        if key in nprops: continue # Skip properties already added
        # Convert the value and key into a type for graph-tool
        tname, _, key  = get_prop_type(val, key)
        prop = gtG.new_vertex_property(tname) # Create the PropertyMap
        gtG.vertex_properties[key] = prop     # Set the PropertyMap
        # Add the key to the already seen properties
        nprops.add(key)
# Also add the node id: in NetworkX a node can be any hashable type, but
# in graph-tool node are defined as indices. So we capture any strings
# in a special PropertyMap called 'id' -- modify as needed!
gtG.vertex_properties['id'] = gtG.new_vertex_property('string')
# Add the edge properties second
eprops = set() # cache keys to only add properties once
for src, dst, data in nxG.edges(data=True):
   # Go through all the edge properties if not seen and add them.
    for key, val in data.items():
        if key in eprops: continue # Skip properties already added
        # Convert the value and key into a type for graph-tool
        tname, _, key = get_prop_type(val, key)
        prop = gtG.new_edge_property(tname) # Create the PropertyMap
        gtG.edge_properties[key] = prop     # Set the PropertyMap
        # Add the key to the already seen properties
        eprops.add(key)
# Phase 2: Actually add all the nodes and vertices with their properties
# Add the nodes
vertices = {} # vertex mapping for tracking edges later
for node, data in nxG.nodes(data=True):
    # Create the vertex and annotate for our edges later
    v = gtG.add_vertex()
    vertices[node] = v
    # Set the vertex properties, not forgetting the id property
    data['id'] = str(node)
    for key, value in data.items():
        gtG.vp[key][v] = value # vp is short for vertex_properties
# Add the edges
for src, dst, data in nxG.edges(data=True):
    # Look up the vertex structs from our vertices mapping and add edge.
    e = gtG.add_edge(vertices[src], vertices[dst])
    # Add the edge properties
    for key, value in data.items():
        gtG.ep[key][e] = value # ep is short for edge_properties
return gtG

#

So, using its method list_properties() I see the following:

directed (graph) (type: bool, val: 0) bipartite (vertex) (type: string) id (vertex) (type: string) weight (edge) (type: double)

OK, undirected, bipartite, with vertices that have the integer sequence as labels and weights for the edges.

So far, it seems everything's fine.

Finally, trying to give the new Graph object to the function minimize_blockmodel_dl and using the method get_blocks() to get the final labels for each vertex in the network, I realize that actually it happens that vertices belonging to different sets of the bipartite network are grouped in clusters together with vertices of the other set of the network. This means that the initial property of being bipartite is not there anymore and the model doesn't apply this constraint.

Why?

I hope some of you who has been using these functions can help me solve my problem. Thanks!

You are trying to fit the wrong model. The stochastic block model assumes that communities consist of nodes that are more strongly connected with each other than they are with other nodes. In a bipartite graph, a node that belongs to one "layer" is guaranteed to not connect to other nodes in that layer. In some way, the block model is the opposite of a bi- or multipartite graph, hence it is not surprising when `get_blocks` etc do not recover your bipartite structure. — Paul Brodersen, Apr 01 '19 at 14:36
Finding the node sets corresponding to different layers in a bipartite graph is pretty straightforward and can be done using DFS and node colouring. Pseudocode [here](https://math.stackexchange.com/questions/1477648/how-to-tell-if-a-graph-is-bipartite). — Paul Brodersen, Apr 01 '19 at 14:40

Community detection with (nested and non-nested) stochastic block model for (weighted) bipartite networks using python package graph-tool

0 Answers0