2

I need to transform a join operation using Spark SQL to a custom join. (logical plan to a custom physical plan). I have written a strategy that transforms the spark join operation to a custom join

object CustomStrategy extends Strategy {
      def apply(plan: LogicalPlan): Seq[SparkPlan] = plan match {
      case Join(left, right, Inner, Some(condition)) 
      => CustomJoin(df1, df2, left.output:: right.output) :: Nil
     case _ => Nil  } }

Is it possible to express the CustomJoin operation on Dataframes, rather than logicalPlan? meaning taking as inputs two dataframes?

syl
  • 419
  • 2
  • 5
  • 17

1 Answers1

0

No. You should assembly execution tree from SparkPlan (not even LogicalPlan!) object. And you can no use Dataframes on physical level, since Dataframe itself is the subject of plan generation.

However you can call planLater(logicalPlan) method to ask latter strategies to provide you SparkPlan and pass it to your CustomJoin. Then inside the doExecute method you may call execute method of the children to obtain RDDs.

schernichkin
  • 1,013
  • 7
  • 15