Check if two dataframes have the same values in the column using .isin in koalas dataframe

Question

I am having a small issue in comparing two dataframes and the dataframes are detailed as below. The dataframes detailed below are all in koalas.

import databricks.koalas as ks


mini_team_df_1 = ks.DataFrame(['0000340b'], columns = ['team_code'])

mini_receipt_df_2 = ks.DataFrame(['0000340b'], columns = ['team_code'])

mini_receipt_df_2['match_flag'] = mini_receipt_df_2['team_code'].isin(ks.DataFrame(mini_team_df_1))

mini_receipt_df_2

I am executing this code on databricks and I expect the mini_receipt_df_2 should have the output as follows:

    team_code   match_flag

0   0000340b     True

But in my code shown above, the output is as follows:

    team_code   match_flag
0   0000340b     False

This makes no sense to me as using the .isin function would give me the True value for team_code = 0000340b as this is the same in both dataframes.

May someone help me understand what is wrong?

Thank you

score 1 · Accepted Answer · 2022-02-09T16:11:55.467

1

Try this:

mini_receipt_df_2['match_flag'] = np.isin(mini_team_df_1['team_code'].to_numpy(), mini_receipt_df_2['team_code'])

Output:

>>> mini_receipt_df_2
  team_code  match_flag
0  0000340b        True

edited Feb 09 '22 at 16:11

answered Feb 09 '22 at 16:06

The input dataframes are koalas dataframe, so I am not sure this will work in my case. Can you help me with a solution that works for koalas dfs? – Anna Feb 09 '22 at 16:08
What won't work about it? – Feb 09 '22 at 16:10
1

I get this error message, ```PandasNotImplementedError: The method `pd.Series.__iter__()` is not implemented. If you want to collect your data as an NumPy array, use 'to_numpy()' instead``` – Anna Feb 09 '22 at 16:11
Okay, I see. Check the answer now. I came up with a different solution. – Feb 09 '22 at 16:12

score 0 · Answer 2 · answered Mar 15 '23 at 06:55

0

mini_receipt_df_2.merge(mini_team_df_1,how='left',suffixes=[None,'_2'])\
    .assign(match_flag=True)

out:

  team_code  match_flag
0  0000340b        True

answered Mar 15 '23 at 06:55

G.G

639
1
5

Check if two dataframes have the same values in the column using .isin in koalas dataframe

2 Answers2