I'm joining some DataFrames together in Spark and I keep getting the following error:
PartitioningCollection requires all of its partitionings have the same numPartitions.
It seems to happen after I join two DataFrames together that each seem fairly reasonable on their own, but after joining them, if I try to get a row from the joined DataFrame, I get this error. I am really just trying to understand why this error might be appearing or what the meaning behind it is as I can't seem to find any documentation on it.
The following invocation results in this exception:
val resultDataframe = dataFrame1
.join(dataFrame2,
$"first_column" === $"second_column").take(2)
but I can certainly call
dataFrame1.take(2)
and
dataFrame2.take(2)
I also tried repartitioning the DataFrames
, using Dataset.repartition(numPartitions)
or Dataset.coalesce(numParitions)
on dataFrame1
and dataFrame2
before joining, and on resultDataFrame
after the join, but nothing seemed to have affected the error. I haven't been able to find reference to other individuals getting the error after some cursory googling...