I'm looking for either an existing library function from some Python library or a custom Python function using numpy or pandas that is fast, that does the following: takes as input a list of edges in a bipartite graph and returns the subset of the edges that are part of alternating paths of length >= 3, excluding the 2 end edges for that alternating path.
For example, from the following graph:
The input would be:
[[B,1],[C,2],[A,2],[C,3],[D,2],[D,3],[E,3],[E,4]]
and the output would be:
[[C,2],[D,3],[E,3]]
In other words, if and edge is part of an alternating path with length < 3, then it is excluded (e.g. [B-1]). Also, for alternating paths of length >=3, only the non-end edges are included. So e.g. [A,2] and [C,3] are not included, and [D,2] and [E,4] are not included.
Right now I'm creating the incidence matrix for the blue and red node sets, and then looking at the 1 entries. Those entries that are 1's that are in a row with a rowsum > 1 and in a column with a columnsum > 1 are the connections I'm looking for. But this method of creating an incidence matrix seems too slow so I'm hoping there is a faster solution.
P.S. I know I can easily make a sparse incidence matrix from 2 columns in Pandas like so:
import pandas as pd
from scipy import sparse
A = pd.get_dummies(df[v2_colname]).groupby(df[v1_colname]).apply(max).astype(np.int)
Asparse = sparse.coo_matrix(A)
But the problem is that the cardinalities of my blue and red node sets are large, and I get memory errors when the cardinalities are > 50000. So instead right now I'm iteratively building each row of the sparse incidence matrix and then looking at the 1's as described earlier. But if there were a fast way of examining the alternating paths in the bipartite graph without building the incidence matrix that would be an excellent solution.