Return dataframes containing unique column pairs in Pandas?

Question

I am trying to use pandas to select rows based on unique column pairs.

For example with the dataframe below read of of an csv:

    col1  col2   col3
0     1    10    [a, b, c, d]
1     1    10    [e, f, g, h]
2     2    11    [a, b, c, d]   
3     3    12    [i, j, k, l]   
4     3    12    [e, f, g, h]
5     5    14    [a, b, c, d]   
6     3    10    [m, n, o, p]

This will give me the unique pairs out of col1, col2

df_unique = df['col1', 'col2'].drop_duplicates()

However, I am not sure about how to use each row in df_unique to return a dataframe containing rows that match.

I believe that I could use merge here, but uncertain about the method to use to go about it.

df.merge(df_unique, on=['col1', 'col2'], how='left')

Something like below but an for loop seems like an inefficient way to do this:

for ['col1','col2'] in df_unique:
    df_dict['col1, 'col2'] = df.merge(some_subframe,  on=['col1', 'col2'], how='left')

Resulting in dataframes like so:

df_uniq_list[(1,10)]
    col1  col2   col3 
0     1    10    [a, b, c, d]
1     1    10    [e, f, g, h]

df_uniq_list[(2,11)]
    col1  col2   col3
2     2    11    [a, b, c, d]   

df_uniq_list[(3,12)]
    col1  col2   col3
3     3    12    [i, j, k, l]   
4     3    12    [e, f, g, h]

`[g for _, g in df.groupby(['col1', 'col2'])]` ? – Psidom Dec 04 '22 at 01:03 — Psidom, Dec 04 '22 at 01:03

score 1 · Accepted Answer · answered Dec 04 '22 at 01:08

1

You could try with

df_uniq_list = dict([*df.groupby(['col1','col2'])])
df_uniq_list[(1,10)]
    col1  col2   col3 
0     1    10    [a, b, c, d]
1     1    10    [e, f, g, h]

answered Dec 04 '22 at 01:08

BENY

317,841
20
164
234

Thank you. Do you mind explaining what the asterisk does here? It worked wonderfully. – Bob R Dec 05 '22 at 19:46

Return dataframes containing unique column pairs in Pandas?

1 Answers1