I am trying to get a subset of my dataframe applying multiple conditions but I am unable to replicate the regular pandas isin behavior in pyspark. Lets say that my goal dataframe is (in pandas):
selection = df[string1.isin(look_string)]
Where string1 is a column from the same df (concatenation of others) but look_string is another df with one column and different length
string1 = esmm.column1 + esmm.column2 + esmm.column3
I am able to code everything in spark but the isin, trying this
df[df.string1.isin(look_string.look_string)]
I get a huge error saying Resolved attribute(s) missing from and trying this
esmms[df.string1.isin(look_string.select("look_string"))]
I get this 'DataFrame' object has no attribute '_get_object_id'
What would be the best way to proceed?