In Apache Spark, there's a function called broadcast
, which marks a DataFrame as small enough to be broadcast in a join. However, what if I want to do the opposite?
Even after adjusting the broadcast threshold, there are times when Spark tries to do a broadcast with DataFrames that are too large, leading to failed tasks. Is it possible to do the opposite of the broadcast
function, and explicitly prevent Spark from broadcasting a specific DataFrame?