Connected components of bipartite graphs in matlab

Question

I have two types of data, X and Y. Every x in X is associated with some number of Ys, and every y in Y may or may not be associated with some number of Xs.

Xs don't associate with other Xs and Ys don't associate with other Ys. So the situation looks like this:

connected components

with Xs on the left and Ys on the right.

I know how to find the connected components of a graph when I only have one type of data: create a N-by-N matrix and call graphconncomp on it. How do I find all connected components when I have two types of data?

Is there any function by which mapping is being done?? You can use that to create N-N matrix. — Mohit Jain, Nov 18 '13 at 19:18
Did you try using one big matrix with numel(X)+numel(Y) times numel(X)+numel(Y) elements? Should work fine. — Daniel, Nov 18 '13 at 19:19
@MohitJain Yes, there is an arc from an X to a Y if the X is a substring of the Y. How can I use this to create a N-N matrix? — rhombidodecahedron, Nov 18 '13 at 19:20
@DanielR: I tried doing this but the matrix is too massive so I run out of memory. In my trial run there are roughly 500 Xs but 50,000 Ys. The matrix is very sparse though - so maybe there is a way to do it of which I am unaware (I am a matlab newbie). — rhombidodecahedron, Nov 18 '13 at 19:20
50.000 Ys and you really want to visualize it? Think visualizing will fail, but use a sparse matrix to store: http://www.mathworks.de/de/help/matlab/ref/sparse.html — Daniel, Nov 18 '13 at 19:22
I'm not necessarily interested in visualizing it; I would like to separate each of the connected components so that I can solve sub-problems on them. I have been using sparse but so far with no luck. — rhombidodecahedron, Nov 18 '13 at 19:32
@EarlBellinger you have `|x|+|y|=50,500` nodes in the graph - that's not too big. How many edges do you have? I worked with graphs much larger than these and had no memory issues. — Shai, Nov 18 '13 at 19:34
@Shai: at least as many edges as nodes, probably somewhere on the order of 5 times as many. But I run out of memory just by calling >> zeros(50000); Out of memory. — rhombidodecahedron, Nov 18 '13 at 19:38
@EarlBellinger no no no! `zeros(50000)` creates a **full** matrix and this is why you ran out of memory. Use only `sparse`! — Shai, Nov 18 '13 at 19:42

score 3 · Accepted Answer · edited Jun 20 '20 at 09:12

3

How to construct the graph's affinity matrix as a sparse matrix:

G = sparse( length(X)+length(Y), length(X)+length(Y) );

This creates an "all zeros" sparse matrix of size |X|+|Y|-by-|X|+|Y|.
If you type

>> whos G

You'll see that despite the fact that G has roughly 50K^2 it takes almost no memory.

Now all you got to do is use your function to set 1 between the corresponding nodes of X and Y and then you'll be able to run graphconncomp on G

The bipartite case

To construct an adjacency matrix for a bipartite graph you can work (initially) with a much smaller (still sparse) matrix B of size |X|-by-|Y|. Let x=length(X) and y=length(Y), then

 B = sparse( x, y ); % if you have an estimate of the number of edges, you can preallocate here

The entry B( ix, jy ) is set to 1 iff node X(ix) is connected to node Y(jy).
Once you finished constructing B, you can use it to form G simply by

 G = [ sparse( x, x ), B; B.', sparse(y, y)];

Note that I do not use zeros to create matrices of all zeros but sparse so the construction will be memory-efficient.

Now you can run graphconncomp on G.

edited Jun 20 '20 at 09:12

Community

1
1

answered Nov 18 '13 at 19:49

Shai

111,146
38
238
371

1

If you know how many edges you need (or a pretty good guess) you might get a performance boost by using `spalloc(N,N,numEdges)` so that all of the necessary memory gets allocated up front. – nispio Nov 18 '13 at 22:10
What does the (i,j)th cell of G represent? Is it X(i) is connected to Y(j)? Or does it have to be something like X(i) is connected to Y(j-length(Y))? – rhombidodecahedron Nov 19 '13 at 04:47
@Shai the initialization of G in the bipartite case gives me an error: Error using horzcat Dimensions of matrices being concatenated are not consistent. Also what does the B; B.' part do? – rhombidodecahedron Nov 19 '13 at 20:23
1

@EarlBellinger please verify that the size of `B` is `|X|`-by-`|Y|` or `x`-by-`y` as defined in the answer. `G` is constructed of four blocks: zeros of size `x`-by-`x`, `B` at the top row. Then `transposed(B)` and zeros of size `y`-by-`y`. `transpose(B)` is the same as `B.'` – Shai Nov 19 '13 at 20:41
@Shai thanks for your help, I was able to use this to get [S,C]=graphconncomp(G) to work. Now I am wondering: how may I find all of the connected component graphs that contain an X? (This part confuses me because the vector that I get back tells me the connected components of G, not B.) – rhombidodecahedron Nov 20 '13 at 17:53
1

@EarlBellinger the first `x` components of `S` are nodes of `X` and the remaining `y` are of `Y`. – Shai Nov 20 '13 at 18:22
1

Thanks! That's all I need! I appreciate it a lot! – rhombidodecahedron Nov 20 '13 at 19:22

Connected components of bipartite graphs in matlab

1 Answers1

The bipartite case