I have two dataframes as shown below.
import databricks.koalas as ks
input_data = ks.DataFrame({'code':['123a', '345b', '678c'],
'id':[1, 2, 3]})
my_data = ks.DataFrame({'code':['123a', '12a', '678c'],
'id':[7, 8, 9], 'stype':['A', 'E', '5']})
These two dataframes have a column called code
and I want to check the values in column code
that exist in my_data
and also exist in input_data
and store them in a resulting dataframe called output
. The output
dataframe will have only the code
column values that are present in the input_data. The number of columns in each dataframe can differ and I have just shown a sample here
The output
dataframe will have a result such as follows based on the provided sample in this question.
display(output)
# Result is below
Code id
'123a' 7
I found solutions online that mostly use for loops but I was wondering if there is a more efficient way to approach this.
Thank you all!