I have a pypark.sql Dataframe which was created using an inner join of two data frames. I have also created one column after joining which provides week_start date based on the date.
Joined_data=Joined_data.withColumn("Week_start_date",date_sub(next_day('AsOfDate','Sun'),7))
Now, when I want to create a list(collection) of all week. I am using the below code.
DateList=Joined_data.select('Week_start_date').dropDuplicates()
I am getting the Error: "Using PythonUDF in join condition of join type LeftSemi is not supported."
If I remove dropDuplicates() method from the above line it runs fine without any error.
Does anyone have any idea why I am getting this error with dropDuplicates() method?