0

I have dataframe like the example:

a b
a c
b c
d e

How can I convert to list without duplicates by pandas or R:

a,b,c
d,e
lczapski
  • 4,026
  • 3
  • 16
  • 32

1 Answers1

1

This is a network problem, so networkx is helpful:

import networkx as nx
G = nx.Graph()
G.add_edges_from([(a,b) for a,b in zip(df[0],df[1])])

list(nx.connected_components(G))

Output:

[{'a', 'b', 'c'}, {'d', 'e'}]
Quang Hoang
  • 146,074
  • 10
  • 56
  • 74