1

I am using Python pyspark.pandas. I have two PySpark dataframes. I want to merge both dataframes and for that I am using merge() method of PySpark.

In my merged dataframe, I need one column indicating from which side data is combine i.e left_only, right_only, both. I did some research and came to know we have similar things in pandas , but not in PySpark.

When I am using below code snippet, my join is working fine. but in my final df dfMerged I need one extra column indicating left, right, both.

 dfMerged = _left_df.merge(_right_df, how='left',left_on= ['Col1'], right_on=['Col1'], suffixes=('_L', '_R'))

In pandas we can use below code snippet but same code snippet is not working in PySpark. It's giving me error

dfMerged = _left_df.merge(_right_df, how='left',left_on= ['Col1'], right_on=['Col1'], suffixes=('_L', '_R'), indicator=  True)

Error:

TypeError: DataFrame.merge() got an unexpected keyword argument 'indicator'

Mark Rotteveel
  • 100,966
  • 191
  • 140
  • 197
Abhishek
  • 972
  • 3
  • 12
  • 24

0 Answers0