0

So I have 2x Polars DF, both containing the columns AuctionId and RealmName. I want to find all the rows which does not exist in one another based on the AuctionId and RealmName combination. Hopefully the below answer can help :)

import polars as pl

dfdata = {"AuctionId": [2084868458, 2085008809, 2087827052, 2087835700, 2087827999, 2087827997],
         "RealmName": ['Gehennas', 'Gehennas', 'Mograine', 'Lakeshire', 'Gehennas', 'Bloodfang']}

df1data = {"AuctionId": [2084868458, 2085008342, 2087827052, 2087833212, 2087827999, 2087812997],
          "RealmName": ['Gehennas', 'Gehennas', 'Mograine', 'Lakeshire', 'Gehennas', 'Bloodfang']}

resultdata = {"AuctionId": [2085008342, 2087833212,2087812997],
         "RealmName": [ 'Gehennas', 'Lakeshire', 'Bloodfang']}

df = pl.DataFrame(dfdata)
df1 = pl.DataFrame(df1data)
resultdf = pl.DataFrame(resultdata)
print(resultdf)
Sarah Messer
  • 3,592
  • 1
  • 26
  • 43
Shamatix
  • 77
  • 1
  • 6

1 Answers1

2

To find the symmetric set difference (any AuctionId/RealmName combination that appears in only one of the two datasets, but you don't care which one):

(
    pl.concat([df, df1])
    .filter(
        pl.count().over(['AuctionId', 'RealmName']) == 1
    )
)
shape: (6, 2)
┌────────────┬───────────┐
│ AuctionId  ┆ RealmName │
│ ---        ┆ ---       │
│ i64        ┆ str       │
╞════════════╪═══════════╡
│ 2085008809 ┆ Gehennas  │
├╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┤
│ 2087835700 ┆ Lakeshire │
├╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┤
│ 2087827997 ┆ Bloodfang │
├╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┤
│ 2085008342 ┆ Gehennas  │
├╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┤
│ 2087833212 ┆ Lakeshire │
├╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┤
│ 2087812997 ┆ Bloodfang │
└────────────┴───────────┘

For set difference (e.g., to find only those combinations of AuctionId/RealmName that appear in df1 but do not appear in df ... but that not other way around), you can use an anti join.

(
    df1
    .join(
        df,
        on=['AuctionId', 'RealmName'],
        how='anti'
    )
)
shape: (3, 2)
┌────────────┬───────────┐
│ AuctionId  ┆ RealmName │
│ ---        ┆ ---       │
│ i64        ┆ str       │
╞════════════╪═══════════╡
│ 2085008342 ┆ Gehennas  │
├╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┤
│ 2087833212 ┆ Lakeshire │
├╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┤
│ 2087812997 ┆ Bloodfang │
└────────────┴───────────┘