3

I have two types of data, X and Y. Every x in X is associated with some number of Ys, and every y in Y may or may not be associated with some number of Xs.

Xs don't associate with other Xs and Ys don't associate with other Ys. So the situation looks like this:

connected components

with Xs on the left and Ys on the right.

I know how to find the connected components of a graph when I only have one type of data: create a N-by-N matrix and call graphconncomp on it. How do I find all connected components when I have two types of data?

rhombidodecahedron
  • 7,693
  • 11
  • 58
  • 91
  • Is there any function by which mapping is being done?? You can use that to create N-N matrix. – Mohit Jain Nov 18 '13 at 19:18
  • 2
    Did you try using one big matrix with numel(X)+numel(Y) times numel(X)+numel(Y) elements? Should work fine. – Daniel Nov 18 '13 at 19:19
  • @MohitJain Yes, there is an arc from an X to a Y if the X is a substring of the Y. How can I use this to create a N-N matrix? – rhombidodecahedron Nov 18 '13 at 19:20
  • @DanielR: I tried doing this but the matrix is too massive so I run out of memory. In my trial run there are roughly 500 Xs but 50,000 Ys. The matrix is very sparse though - so maybe there is a way to do it of which I am unaware (I am a matlab newbie). – rhombidodecahedron Nov 18 '13 at 19:20
  • 50.000 Ys and you really want to visualize it? Think visualizing will fail, but use a sparse matrix to store: http://www.mathworks.de/de/help/matlab/ref/sparse.html – Daniel Nov 18 '13 at 19:22
  • I'm not necessarily interested in visualizing it; I would like to separate each of the connected components so that I can solve sub-problems on them. I have been using sparse but so far with no luck. – rhombidodecahedron Nov 18 '13 at 19:32
  • @EarlBellinger you have `|x|+|y|=50,500` nodes in the graph - that's not too big. How many edges do you have? I worked with graphs much larger than these and had no memory issues. – Shai Nov 18 '13 at 19:34
  • @Shai: at least as many edges as nodes, probably somewhere on the order of 5 times as many. But I run out of memory just by calling >> zeros(50000); Out of memory. – rhombidodecahedron Nov 18 '13 at 19:38
  • @EarlBellinger no no no! `zeros(50000)` creates a **full** matrix and this is why you ran out of memory. Use only `sparse`! – Shai Nov 18 '13 at 19:42

1 Answers1

3

How to construct the graph's affinity matrix as a sparse matrix:

G = sparse( length(X)+length(Y), length(X)+length(Y) );

This creates an "all zeros" sparse matrix of size |X|+|Y|-by-|X|+|Y|.
If you type

>> whos G

You'll see that despite the fact that G has roughly 50K^2 it takes almost no memory.

Now all you got to do is use your function to set 1 between the corresponding nodes of X and Y and then you'll be able to run graphconncomp on G


The bipartite case

To construct an adjacency matrix for a bipartite graph you can work (initially) with a much smaller (still sparse) matrix B of size |X|-by-|Y|. Let x=length(X) and y=length(Y), then

 B = sparse( x, y ); % if you have an estimate of the number of edges, you can preallocate here

The entry B( ix, jy ) is set to 1 iff node X(ix) is connected to node Y(jy).
Once you finished constructing B, you can use it to form G simply by

 G = [ sparse( x, x ), B; B.', sparse(y, y)];

Note that I do not use zeros to create matrices of all zeros but sparse so the construction will be memory-efficient.

Now you can run graphconncomp on G.

Community
  • 1
  • 1
Shai
  • 111,146
  • 38
  • 238
  • 371
  • 1
    If you know how many edges you need (or a pretty good guess) you might get a performance boost by using `spalloc(N,N,numEdges)` so that all of the necessary memory gets allocated up front. – nispio Nov 18 '13 at 22:10
  • What does the (i,j)th cell of G represent? Is it X(i) is connected to Y(j)? Or does it have to be something like X(i) is connected to Y(j-length(Y))? – rhombidodecahedron Nov 19 '13 at 04:47
  • @Shai the initialization of G in the bipartite case gives me an error: Error using horzcat Dimensions of matrices being concatenated are not consistent. Also what does the B; B.' part do? – rhombidodecahedron Nov 19 '13 at 20:23
  • 1
    @EarlBellinger please verify that the size of `B` is `|X|`-by-`|Y|` or `x`-by-`y` as defined in the answer. `G` is constructed of four blocks: zeros of size `x`-by-`x`, `B` at the top row. Then `transposed(B)` and zeros of size `y`-by-`y`. `transpose(B)` is the same as `B.'` – Shai Nov 19 '13 at 20:41
  • @Shai thanks for your help, I was able to use this to get [S,C]=graphconncomp(G) to work. Now I am wondering: how may I find all of the connected component graphs that contain an X? (This part confuses me because the vector that I get back tells me the connected components of G, not B.) – rhombidodecahedron Nov 20 '13 at 17:53
  • 1
    @EarlBellinger the first `x` components of `S` are nodes of `X` and the remaining `y` are of `Y`. – Shai Nov 20 '13 at 18:22
  • 1
    Thanks! That's all I need! I appreciate it a lot! – rhombidodecahedron Nov 20 '13 at 19:22