1

Graphframe connectedComponents is throwing exceptions when i try to run my spark job from databricks connect. Here are the configurations i am using for spark session

spark = (
SparkSession
    .builder
    .config(
        "spark.jars.packages",
        "graphframes:graphframes:0.8.2-spark3.2-s_2.12"
    )
    .getOrCreate()
)

Code:

edges = genealogy.toDF('src', 'dst')
g = GraphFrame(vertices, edges)

spark.sparkContext.setCheckpointDir('/')
cc = g.connectedComponents()#.cache()

Exception

py4j.protocol.Py4JJavaError: An error occurred while calling o120.run.
: java.util.NoSuchElementException: None.get
        at scala.None$.get(Option.scala:529)
        at scala.None$.get(Option.scala:527)
        at org.apache.spark.sql.util.ProtoSerializer$.fromOption(ProtoSerializer.scala:121)
        at org.apache.spark.sql.util.ProtoSerializer.deserializeExpr(ProtoSerializer.scala:7393)
        at org.apache.spark.sql.util.ProtoSerializer.$anonfun$deserializeExpr$5(ProtoSerializer.scala:7411)
        at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
        at scala.collection.Iterator.foreach(Iterator.scala:943)
        at scala.collection.Iterator.foreach$(Iterator.scala:943)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
        at scala.collection.IterableLike.foreach(IterableLike.scala:74)
        at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
        at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
        at scala.collection.TraversableLike.map(TraversableLike.scala:286)
        at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
        at scala.collection.AbstractTraversable.map(Traversable.scala:108)
        at org.apache.spark.sql.util.ProtoSerializer.deserializeExpr(ProtoSerializer.scala:7411)
        at org.apache.spark.sql.util.ProtoSerializer.deserializePlan0(ProtoSerializer.scala:4937)
        at org.apache.spark.sql.util.ProtoSerializer.deserializePlan(ProtoSerializer.scala:4701)
        at com.databricks.service.SparkServiceRPCHandler.execute0(SparkServiceRPCHandler.scala:666)
        at com.databricks.service.SparkServiceRPCHandler.$anonfun$executeRPC0$1(SparkServiceRPCHandler.scala:477)
        at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
        at com.databricks.service.SparkServiceRPCHandler.executeRPC0(SparkServiceRPCHandler.scala:372)
        at com.databricks.service.SparkServiceRPCHandler$$anon$2.call(SparkServiceRPCHandler.scala:323)
        at com.databricks.service.SparkServiceRPCHandler$$anon$2.call(SparkServiceRPCHandler.scala:309)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at com.databricks.service.SparkServiceRPCHandler.$anonfun$executeRPC$1(SparkServiceRPCHandler.scala:359)
        at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
        at com.databricks.service.SparkServiceRPCHandler.executeRPC(SparkServiceRPCHandler.scala:336)
        at com.databricks.service.SparkServiceRPCServlet.doPost(SparkServiceRPCServer.scala:167)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:523)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:590)
        at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:799)
        at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:550)
        at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:190)
        at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:501)
        at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
        at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
        at org.eclipse.jetty.server.Server.handle(Server.java:516)
        at org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:388)
        at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:633)
        at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:380)
        at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:277)
        at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
        at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)
        at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104)
        at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:338)
        at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:315)
        at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:173)
        at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131)
        at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:386)
        at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:883)
        at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1034)
        at java.lang.Thread.run(Thread.java:748)

The same code runs fine on databricks notebooks. I went an look if there is something that is written to the checkpoint directory and did not find anything there. I wanted to know what is it that i am missing. Is it an installation issue? If so, what is the correct way. Or there is something else that i am missing here.

Cluster Details:

10.4 LTS (includes Apache Spark 3.2.1, Scala 2.12)

shahidammer
  • 1,026
  • 2
  • 10
  • 24

0 Answers0