I am using Python pyspark.pandas
. I have two PySpark dataframes. I want to merge both dataframes and for that I am using merge()
method of PySpark.
In my merged dataframe, I need one column indicating from which side data is combine i.e left_only
, right_only
, both
. I did some research and came to know we have similar things in pandas , but not in PySpark.
When I am using below code snippet, my join is working fine. but in my final df dfMerged
I need one extra column indicating left
, right
, both
.
dfMerged = _left_df.merge(_right_df, how='left',left_on= ['Col1'], right_on=['Col1'], suffixes=('_L', '_R'))
In pandas we can use below code snippet but same code snippet is not working in PySpark. It's giving me error
dfMerged = _left_df.merge(_right_df, how='left',left_on= ['Col1'], right_on=['Col1'], suffixes=('_L', '_R'), indicator= True)
Error:
TypeError: DataFrame.merge() got an unexpected keyword argument 'indicator'