0

I am trying to use pandas to select rows based on unique column pairs.

For example with the dataframe below read of of an csv:

    col1  col2   col3
0     1    10    [a, b, c, d]
1     1    10    [e, f, g, h]
2     2    11    [a, b, c, d]   
3     3    12    [i, j, k, l]   
4     3    12    [e, f, g, h]
5     5    14    [a, b, c, d]   
6     3    10    [m, n, o, p] 

This will give me the unique pairs out of col1, col2

df_unique = df['col1', 'col2'].drop_duplicates()

However, I am not sure about how to use each row in df_unique to return a dataframe containing rows that match.

I believe that I could use merge here, but uncertain about the method to use to go about it.

df.merge(df_unique, on=['col1', 'col2'], how='left')

Something like below but an for loop seems like an inefficient way to do this:

for ['col1','col2'] in df_unique:
    df_dict['col1, 'col2'] = df.merge(some_subframe,  on=['col1', 'col2'], how='left')

Resulting in dataframes like so:

df_uniq_list[(1,10)]
    col1  col2   col3 
0     1    10    [a, b, c, d]
1     1    10    [e, f, g, h]

df_uniq_list[(2,11)]
    col1  col2   col3
2     2    11    [a, b, c, d]   

df_uniq_list[(3,12)]
    col1  col2   col3
3     3    12    [i, j, k, l]   
4     3    12    [e, f, g, h]
Bob R
  • 605
  • 1
  • 13
  • 25

1 Answers1

1

You could try with

df_uniq_list = dict([*df.groupby(['col1','col2'])])
df_uniq_list[(1,10)]
    col1  col2   col3 
0     1    10    [a, b, c, d]
1     1    10    [e, f, g, h]
BENY
  • 317,841
  • 20
  • 164
  • 234