I have dataframe like the example:
a b
a c
b c
d e
How can I convert to list without duplicates by pandas or R:
a,b,c
d,e
This is a network problem, so networkx
is helpful:
import networkx as nx
G = nx.Graph()
G.add_edges_from([(a,b) for a,b in zip(df[0],df[1])])
list(nx.connected_components(G))
Output:
[{'a', 'b', 'c'}, {'d', 'e'}]