Goal: Want to efficiently find all the disconnected graphs from a large collection of sets
For example, I have a data file like the following:
A, B, C
C, D, E
A, F, Z
G, J
...
each entry represents a set of element. First entries A, B, C = {A, B, C} This also indicate that there is a edge between A and B, A and C, B and C.
The algorithm I initially came up with was the following
1.parse all the entries into a list:
[
{A,B,C}
{C,D,E}
...
]
2.start with the first element/set of the list can called start_entry, {A,B,C} in this case
3.traverse other element in the list and do the following:
if the intersection of the element and start_entry is not empty
start_entry = start_entry union with the element
remove element from the list
4.with the updated start_entry, traverse the list again until there is not new update
The algorithm above should return a list of vertex of connected graph. Nevertheless, I ran into the runtime problem due to the dataset size. There is ~100000 entries. So I just wonder if anyone knows there is more efficient way to find connected graph.
The data structure could also be altered into (if this is easier) A,B B,C E,F ... with each entry represent an edge of graph.