1

I want to write and update by Kudu API. This is the maven dependency:

<dependency>
  <groupId>org.apache.kudu</groupId>
  <artifactId>kudu-client</artifactId>
  <version>1.1.0</version>
</dependency>
<dependency>
  <groupId>org.apache.kudu</groupId>
  <artifactId>kudu-spark2_2.11</artifactId>
  <version>1.1.0</version>
</dependency>

In the following code, I have no idea about KuduContext parameter.

My code in spark2-shell:

val kuduContext = new KuduContext("master:7051") 

Also the same error in Spark 2.1 streaming:

import org.apache.kudu.spark.kudu._
import org.apache.kudu.client._
val sparkConf = new SparkConf().setAppName("DirectKafka").setMaster("local[*]")
val ssc = new StreamingContext(sparkConf, Seconds(2))
val messages = KafkaUtils.createDirectStream("")
messages.foreachRDD(rdd => {
   val spark = SparkSession.builder.config(rdd.sparkContext.getConf).getOrCreate()
   import spark.implicits._
   val bb = spark.read.options(Map("kudu.master" -> "master:7051","kudu.table" -> "table")).kudu //good 
   val kuduContext = new KuduContext("master:7051") //error
})

Then the error:

org.apache.spark.SparkException: Only one SparkContext may be running in this JVM (see SPARK-2243). To ignore this error, set spark.driver.allowMultipleContexts = true. The currently running SparkContext was created at: org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:860)

Shaido
  • 27,497
  • 23
  • 70
  • 73
Autumn
  • 109
  • 13

1 Answers1

1

Update your version of Kudu to the latest one (currently 1.5.0). The KuduContext takes the SparkContext as an input parameter in later versions and that should prevent this problem.

Also, do the initial Spark initialization outside of the foreachRDD. In the code you provided, move both the spark and kuduContext out of the foreach. Also, you do not need to create a separate sparkConf, you can use the newer SparkSession only.

val spark = SparkSession.builder.appName("DirectKafka").master("local[*]").getOrCreate()
import spark.implicits._

val kuduContext = new KuduContext("master:7051", spark.sparkContext)
val bb = spark.read.options(Map("kudu.master" -> "master:7051", "kudu.table" -> "table")).kudu

val messages = KafkaUtils.createDirectStream("")
messages.foreachRDD(rdd => {   
  // do something with the bb table and messages       
})
Shaido
  • 27,497
  • 23
  • 70
  • 73
  • 1
    @ cricket_007. with kudu-spark2_2.11_1.1.0 ,seems only one parameters KuduContext(org.apache.kudu.spark.kudu) – Autumn Jan 10 '18 at 03:22
  • Spark initialization inside of the foreachRDD due to spark streaming doc. out foreachRD there is val ssc = new StreamingContext(sparkConf, Seconds(2). – Autumn Jan 10 '18 at 03:40
  • @Autumn: There should never be a need to do that kind of initialization inside a foreach. Where did yuou see it? – Shaido Jan 10 '18 at 03:44
  • it's doc:https://spark.apache.org/docs/2.1.0/streaming-programming-guide.html#dataframe-and-sql-operations – Autumn Jan 10 '18 at 03:47
  • @Autumn: Interesting, you learn something new everyday. Although I guess it should only be necessary if the configuration changes between the streamed dataframes, so in most cases it should be fine on the outside. An important difference between the doc and what you used is that the `SparkSession` is not actually created in the `foreachRDD` in the docs, it is retrieving an existing session. In other words. – Shaido Jan 10 '18 at 03:53
  • 1
    @Autumn: Looking at the [source code](https://github.com/apache/spark/blob/v2.1.0/examples/src/main/scala/org/apache/spark/examples/streaming/SqlNetworkWordCount.scala) linked from the documentation, they actual define a `SparkSessionSingleton` object that they use inside the loop. – Shaido Jan 10 '18 at 05:15
  • Which is defined at the bottom of the class as just that one line – OneCricketeer Jan 10 '18 at 05:36
  • @cricket_007: Do you know if there any good reason why `SparkSessionSingleton` is created and used instead of simply using `SparkSession`? Something to do with the configuration? I.e. that `rdd.sparkContext.getConf` is used in the loop. – Shaido Jan 10 '18 at 05:43
  • It's a singleton, so `getOrCreate` is only called once? I don't know why the RDD context would be different – OneCricketeer Jan 10 '18 at 05:44
  • @cricket_007: It would only be called once. However, I thought `getOrCreate` already takes care of that, in other words, it will create the object if it doesn't exists otherwise just retrive it. It feels like it is a singleton around a singleton... – Shaido Jan 10 '18 at 05:48
  • @ Shaido, thanks your patient reply. the above source is same with my code . both SparkSessionSingleton or spark is inside the loop, the error is the same whatever kuduContext inside or outside. – Autumn Jan 10 '18 at 05:51
  • @Autumn: Can you try adding a secondary paramter when creating the `KuduContext`? I saw here: https://kudu.apache.org/docs/developing.html that maybe it is necessary for version 1.1.0 as well (when using Spark 2.0+). Added it to the code in the answer. – Shaido Jan 10 '18 at 06:28
  • @Shaido,can't. it'll warn when add 2 parameters. In idea, it points out "val kuduContext = new KuduContext("org.apache.kudu.spark.kudu")" – Autumn Jan 10 '18 at 06:36
  • @Autumn: I see. Is updating Kudu a possibility (1.5.0 is the newest version)? – Shaido Jan 10 '18 at 06:38
  • @Shaido: 1.5.0 KuduContext also need only parameter. by the way, KuduContext is discarded; in 1.5.0(i use spark2.1 and scala 2.11). now,i'm finding a API which could create, delete, or write to Kudu tables that in place of KuduContext . – Autumn Jan 10 '18 at 06:57
  • @Autumn: According to https://kudu.apache.org/releases/1.5.0/docs/developing.html#_kudu_integration_with_spark there should be two paramters in version 1.5. Since the sparkContext is used for input it should (hopefully) solve the problem. – Shaido Jan 10 '18 at 07:48
  • @Shaido oh yes, it's two parameters in version 1.5. thanks a lot. could you update in your answer, i would accept it. – Autumn Jan 10 '18 at 08:14
  • @Shaido. unluckily, other issues which is different from this question. so i open a new [question](https://stackoverflow.com/questions/48183107/kudu-client-has-already-been-closed-in-spark-streaming) – Autumn Jan 10 '18 at 08:27