How Spark broadcast the data when we use Broadcast Join with hint - As I can see when we use the broadcast hint: It calls this function
def broadcast[T](df: Dataset[T]): Dataset[T] = {
Dataset[T](df.sparkSession,
ResolvedHint(df.logicalPlan, HintInfo(strategy = Some(BROADCAST))))(df.exprEnc)
}
Which internally calls the apply method of dataset & set the logicalPlan using ResolvedHint
val dataset = new Dataset(sparkSession, logicalPlan, implicitly[Encoder[T]])
But what is after this. How this actually work, where is code written for that.
- What if we have multiple partitions of small dataset (which we are going to broadcast), does spark combine all partitions & then broadcast?
- Does it broadcast to driver first & then it goes executors.
- What is BitTorrent.