2

I'm looking for either an existing library function from some Python library or a custom Python function using numpy or pandas that is fast, that does the following: takes as input a list of edges in a bipartite graph and returns the subset of the edges that are part of alternating paths of length >= 3, excluding the 2 end edges for that alternating path.

For example, from the following graph: enter image description here

The input would be:

[[B,1],[C,2],[A,2],[C,3],[D,2],[D,3],[E,3],[E,4]]

and the output would be:

[[C,2],[D,3],[E,3]]

In other words, if and edge is part of an alternating path with length < 3, then it is excluded (e.g. [B-1]). Also, for alternating paths of length >=3, only the non-end edges are included. So e.g. [A,2] and [C,3] are not included, and [D,2] and [E,4] are not included.


Right now I'm creating the incidence matrix for the blue and red node sets, and then looking at the 1 entries. Those entries that are 1's that are in a row with a rowsum > 1 and in a column with a columnsum > 1 are the connections I'm looking for. But this method of creating an incidence matrix seems too slow so I'm hoping there is a faster solution.

P.S. I know I can easily make a sparse incidence matrix from 2 columns in Pandas like so:

import pandas as pd
from scipy import sparse
A = pd.get_dummies(df[v2_colname]).groupby(df[v1_colname]).apply(max).astype(np.int)
Asparse = sparse.coo_matrix(A)

But the problem is that the cardinalities of my blue and red node sets are large, and I get memory errors when the cardinalities are > 50000. So instead right now I'm iteratively building each row of the sparse incidence matrix and then looking at the 1's as described earlier. But if there were a fast way of examining the alternating paths in the bipartite graph without building the incidence matrix that would be an excellent solution.

sambajetson
  • 193
  • 1
  • 9

0 Answers0