0

Is there a way to split a data set consisting of pairs of 3D points (or just their index numbers) into connected clusters? That is, two pairs (a,b) and (c,d) should be in the same cluster if they share a common point (i.e. a = c, b = c, a = d or b = d) or if there is a chain of one or more other pairs, each sharing a common point with the previous one, from one pair to the other.

For example, the list of pairs:

[[1,2],[2,3],[4,5],[6,7],[7,8],[9,4],[8,5]]

would be grouped as follows:

[[1,2],[2,3]]

[[4,5],[6,7],[7,8],[9,4],[8,5]]

In the first cluster, the pairs (1,2) and (2,3) have the point 2 in common, and share no points with any pairs outside the cluster. In the second cluster, the pair (4,5) shares common points with (9,4) and (8,5), while (8,5) has a common point with (7,8), which has a common point with (6,7).

The data is originally stored in a numpy array, but the output format is not too important.

I need to be able to access the data that makes up each individual cluster afterwards.

Ilmari Karonen
  • 49,047
  • 9
  • 93
  • 153
Ben Bird
  • 374
  • 1
  • 5
  • 16
  • 2
    I can't understand the logic of the grouping. If `[1,2],[2,3]` is in a cluster, why isn't `[6,7],[7,8]` also in that, or its own, cluster? What do you mean by "repeated points"? – roganjosh Oct 19 '17 at 18:28
  • 1
    @roganjosh I think the problem can be expressed as finding the connected components of a graph, where the given pairs are edges and the numbers are nodes. OP, check out networkx. – Alex Hall Oct 19 '17 at 19:27
  • @AlexHall that explanation unfortunately is not helped by the random edit to add comments (not by the OP or you) to the question. But even so, am I to interpret the required output as identifying a break in a connection graph, and dump the remainder to another list? `[6,7],[7,8]` is still connected, but it appears with the rest. – roganjosh Oct 19 '17 at 19:29
  • 1
    @roganjosh I'm guessing OP wants a list of clusters. In this case there are just two, because 4,5,6,7,8,9 are all indirectly connected. – Alex Hall Oct 19 '17 at 19:32
  • @AlexHall I'm of the opinion I should roll back the edit, what are your thoughts? It's not the OP's explanation and actually it clarifies nothing to me but maybe I'm missing something and it helped you get your train of thought. – roganjosh Oct 19 '17 at 19:33
  • @AlexHall Ok, your last comment has clarified it for me perfectly. – roganjosh Oct 19 '17 at 19:34
  • Please don't just repeat the question when it was closed, but instead *edit and improve* the existing question, then ask for it to be reopened. You appear to be asking for the **transitive closure** of a relation (and the connected components), not clustering - a quite fundamental mathematical concept. Using libraries like networkx here seems overkill to me. – Has QUIT--Anony-Mousse Oct 20 '17 at 06:14
  • @Anony-Mousse: Given that the question you closed this as a dupe of has now been deleted, while this one has an accepted answer, I think this question should probably be reopened. I've tried to edit it for clarity. – Ilmari Karonen Oct 29 '17 at 16:10
  • 1
    @IlmariKaronen I cannot reopen in the mobile app, sorry. That seems to be browser only. And I mostly read SO on my phone when commuting. – Has QUIT--Anony-Mousse Nov 08 '17 at 18:12
  • @Anony-Mousse: OK, fine, I've voted to reopen it, then. (I'd report that missing feature on meta, but the mobile app is basically abandonware anyway.) Although there *is* an "Open in browser" option in the mobile app (hidden behind "... More", at least on Android), you know. – Ilmari Karonen Nov 08 '17 at 18:18

1 Answers1

2

Using networkx:

import networkx

edges = [[1, 2], [2, 3], [4, 5], [6, 7], [7, 8], [9, 4], [8, 5]]

graph = networkx.Graph(edges)
print(list(networkx.connected_components(graph)))

Output:

[set([1, 2, 3]), set([4, 5, 6, 7, 8, 9])]
Alex Hall
  • 34,833
  • 5
  • 57
  • 89