There are few tricks you can do.
Some ideas:
- you can try inner join, and then get the list of unique values, which appear in both dataframes. Then you can use the
isin
method in the smaller dataframe and that list to get your answer.
Dunno if this will work out of the box, but it would be something like:
df_join = df_small.join(df_big, on='key', allow_duplicates=True)
common_samples = df_join[key].tolist()
df_small['is_in_df_big'] = df_small.key.isin(common_samples)
# If it is something you gonna reuse a lot, but be worth doing
df_small = df_small.materialize('is_in_df_big') # to put it in memory otherwise it will be lazily recomputed each time you need it.
Similar idea: instead of doing join do something like:
unique_samples = df_small.key.unique()
common_samples = df_big[df_big.key.isin(unique_samples)].key.unqiue()
df_small['is_in_df_big'] = df_small.key.isin(common_samples)
I dunno which one would be faster. I hope this at least will lead to some inspiration if not to the full solution.