0

What is the GraphLab equivalent to the following NetworkX code?
for nodeset in nx.connected_components(G):

In GraphLab, I would like to obtain a set of Vertex IDs for each connected component.

user2715877
  • 543
  • 1
  • 11
  • 22
  • The output of `graphlab.graph_analytics.connected_components.create(G)` should give each vertex ID a component assignment. What else are you trying to do with the vertices for each connected component? – papayawarrior Feb 18 '16 at 22:44

2 Answers2

1

The component IDs returned by graphlab.graph_analytics.connected_components are in the form of an SFrame, so the easiest way to get the IDs for a given component is by filtering the SFrame:

# Make a graph with two components.
import graphlab
G = graphlab.SGraph().add_edges(
    [graphlab.Edge(i, i+1) for i in range(3)])
G = G.add_edges([graphlab.Edge(i, i+1) for i in range(4, 6)])

# Get the connected components.
cc = graphlab.connected_components.create(G)

# Find the vertices for a given component (0, in this example).
nodes = cc.component_id.filter_by(0, 'component_id')
print nodes

+------+--------------+
| __id | component_id |
+------+--------------+
|  5   |      0       |
|  6   |      0       |
|  4   |      0       |
+------+--------------+
[3 rows x 2 columns]
papayawarrior
  • 1,027
  • 7
  • 10
  • Is the filter_by method the fasting way to iterate through the connected components? The graph I built has 25M connected components, mostly of two Vertices. – user2715877 Feb 19 '16 at 04:02
  • Oh, I see - yeah, the best strategy might be different in that case. Are you piping all of the 25M components to some other model, or are you looking at the results manually? If it's just a few calls to `filter_by` I suspect that's the fastest way, but I wouldn't want to call it 25M times... – papayawarrior Feb 19 '16 at 19:38
0

Here is the first cut at porting from NetworkX to GraphLab. However, iterating appears to be very slow.
temp1 = cc['component_id']
temp1.remove_column('__id')
id_set = set()
id_set = temp1['component_id']
for item in id_set:
nodeset = cc_out[cc_out['component_id'] == item]['__id']

user2715877
  • 543
  • 1
  • 11
  • 22