2

We have a Spark Streaming job, inside DStream foreachRDD method, i am creating a SQLContext, the reason that i am creating SQLContext inside foreachRDD method instead of outside is, when i enable check-pointing it says SQLContext is not serializable.

So i created inside foreachcRDD method, then its working fine.

But i want to close this SQLContext properly at the end of foreachRDD method. How can i do that?

Sample code:

dStream.foreachRDD(rdd =>
      {
        val sparkContext = rdd.sparkContext
        val sqlContext = new HiveContext(sc)

        //convert rdd to DF using sqlContext and process it
            ......
            ......
            ......
            ......

       //finally close the sqlContext
        sqlContext = null
        }
      })
Shankar
  • 8,529
  • 26
  • 90
  • 159
  • Why do you want to close the `SQLContext` at the end of each batch? – Yuval Itzchakov Feb 16 '17 at 07:14
  • when i see the spark UI, i can see multiple SQL tabs are opened, for each batch interval it opens one SQL Context, will it automatically get closed? – Shankar Feb 16 '17 at 09:25
  • Perhaps you want [`SQLContext.getOrCreate`](https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.SQLContext$@getOrCreate(sparkContext:org.apache.spark.SparkContext):org.apache.spark.sql.SQLContext)(although I see its deprecated)? – Yuval Itzchakov Feb 16 '17 at 09:28
  • @YuvalItzchakov: The problem is we want HiveContext not SQLContext, In HiveContext i don't see this method.. – Shankar Feb 16 '17 at 09:38
  • @YuvalItzchakov : Do you have any idea, how we can restrict this? its showing multiple SQL tabs on Spark UI. – Shankar Feb 17 '17 at 04:26

0 Answers0