How can I do an outer join on two columns

Question

So I have 2x Polars DF, both containing the columns AuctionId and RealmName. I want to find all the rows which does not exist in one another based on the AuctionId and RealmName combination. Hopefully the below answer can help :)

import polars as pl

dfdata = {"AuctionId": [2084868458, 2085008809, 2087827052, 2087835700, 2087827999, 2087827997],
         "RealmName": ['Gehennas', 'Gehennas', 'Mograine', 'Lakeshire', 'Gehennas', 'Bloodfang']}

df1data = {"AuctionId": [2084868458, 2085008342, 2087827052, 2087833212, 2087827999, 2087812997],
          "RealmName": ['Gehennas', 'Gehennas', 'Mograine', 'Lakeshire', 'Gehennas', 'Bloodfang']}

resultdata = {"AuctionId": [2085008342, 2087833212,2087812997],
         "RealmName": [ 'Gehennas', 'Lakeshire', 'Bloodfang']}

df = pl.DataFrame(dfdata)
df1 = pl.DataFrame(df1data)
resultdf = pl.DataFrame(resultdata)
print(resultdf)

score 2 · Answer 1 · answered Aug 24 '22 at 20:29

To find the symmetric set difference (any AuctionId/RealmName combination that appears in only one of the two datasets, but you don't care which one):

(
    pl.concat([df, df1])
    .filter(
        pl.count().over(['AuctionId', 'RealmName']) == 1
    )
)

shape: (6, 2)
┌────────────┬───────────┐
│ AuctionId  ┆ RealmName │
│ ---        ┆ ---       │
│ i64        ┆ str       │
╞════════════╪═══════════╡
│ 2085008809 ┆ Gehennas  │
├╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┤
│ 2087835700 ┆ Lakeshire │
├╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┤
│ 2087827997 ┆ Bloodfang │
├╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┤
│ 2085008342 ┆ Gehennas  │
├╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┤
│ 2087833212 ┆ Lakeshire │
├╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┤
│ 2087812997 ┆ Bloodfang │
└────────────┴───────────┘

For set difference (e.g., to find only those combinations of AuctionId/RealmName that appear in df1 but do not appear in df ... but that not other way around), you can use an anti join.

(
    df1
    .join(
        df,
        on=['AuctionId', 'RealmName'],
        how='anti'
    )
)

shape: (3, 2)
┌────────────┬───────────┐
│ AuctionId  ┆ RealmName │
│ ---        ┆ ---       │
│ i64        ┆ str       │
╞════════════╪═══════════╡
│ 2085008342 ┆ Gehennas  │
├╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┤
│ 2087833212 ┆ Lakeshire │
├╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┤
│ 2087812997 ┆ Bloodfang │
└────────────┴───────────┘

How can I do an outer join on two columns

1 Answers1