8

I created a spark DataFrame in a Python paragraph in Zeppelin.

sqlCtx = SQLContext(sc)
spDf = sqlCtx.createDataFrame(df)

and df is a pandas dataframe

print(type(df))
<class 'pandas.core.frame.DataFrame'>

what I want to do is moving spDf from one Python paragraph to another Scala paragraph. It look a reasonable way to do is using z.put.

z.put("spDf", spDf)

and I got this error:

AttributeError: 'DataFrame' object has no attribute '_get_object_id'

Any suggestion to fix the error? Or any suggestion to move spDf?

zero323
  • 322,348
  • 103
  • 959
  • 935
MTT
  • 5,113
  • 7
  • 35
  • 61

1 Answers1

11

You canput internal Java object not a Python wrapper:

%pyspark

df = sc.parallelize([(1, "foo"), (2, "bar")]).toDF(["k", "v"])
z.put("df", df._jdf)

and then make sure you use correct type:

val df = z.get("df").asInstanceOf[org.apache.spark.sql.DataFrame]
// df: org.apache.spark.sql.DataFrame = [k: bigint, v: string]

but it is better to register temporary table:

%pyspark

# registerTempTable in Spark 1.x
df.createTempView("df")

and use SQLContext.table to read it:

// sqlContext.table in Spark 1.x
val df = spark.table("df")
df: org.apache.spark.sql.DataFrame = [k: bigint, v: string]

To convert in the opposite direction see Zeppelin: Scala Dataframe to python

Community
  • 1
  • 1
zero323
  • 322,348
  • 103
  • 959
  • 935