0

Can one export a Spark logical or physical plan of a dataframe/set, serialize it and save it somewhere (as text, xml, json ...). Then re-import it, and create a dataframe based on it ?

The idea here is, I'm interested in having a metastore for Spark dataframes where I can save dataframes logical or physical plans, so that others could use them.

  • Does these answers help? [In spark, is it possible to reuse a DataFrame's execution plan to apply it to different data sources](https://stackoverflow.com/questions/58932701/in-spark-is-it-possible-to-reuse-a-dataframes-execution-plan-to-apply-it-to-di), [How do I get a spark dataframe to print it's explain plan to a string](https://stackoverflow.com/questions/55614122/how-do-i-get-a-spark-dataframe-to-print-its-explain-plan-to-a-string) – mazaneicha Jun 09 '20 at 16:51
  • No, this is not even close. – Hamza EL KAROUI Jun 10 '20 at 05:34

1 Answers1

3

spark 2.4.2 below code may be different for lower version of spark.

Check below code.

spark.read.json(Seq(df.queryExecution.logical.toJSON).toDS).write.format("json").save("logical")
spark.read.json(Seq(df.queryExecution.sparkPlan.toJSON).toDS).write.format("json").save("sparkPlan")
spark.read.json(Seq(df.queryExecution.executedPlan.toJSON).toDS).write.format("json").save("executedPlan")
spark.read.json(Seq(df.queryExecution.analyzed.toJSON).toDS).write.format("json").save("analyzed")

Srinivas
  • 8,957
  • 2
  • 12
  • 26
  • I tried to read some csv file, and exported plans like you suggested. Neither one of them contained the path for my source, and I thing they only point to objects in the JVM (I'm not sure). – Hamza EL KAROUI Jun 10 '20 at 05:29
  • 1
    Can you please tell me what are you expecting with these ?? – Srinivas Jun 10 '20 at 06:10
  • I was expecting to have all informations like the source files, etc .. My idea is that I want to only save Spak plans to a file then re-import and evaluate them as actual dataframes in order to be reused later on by other users. In other words, I'm trying to build some sort of a store for dataframes plans (without persisting data) that can be shared across the company. – Hamza EL KAROUI Jun 10 '20 at 06:31
  • @Srinivas is there a way to instantiate a DataFrame (Scala or Python) object from a physical plan by chance? I have been digging and couldn't find a way to make a PySpark DataFrame from a plan :/ – Stepan Ulyanin Oct 24 '21 at 21:55