0

This question is related to the previous thread. I am extracting sessions from the stream of users' click events. For validation purposes, I am always waiting on a timeout of 2 minutes, and if the user was inactive during these 2 minutes (no click events), then I assume that the session was finished. These finished sessions should be saved in finishedSessions.

The below-given code provides the error (see below).

   settings = ssc.sparkContext.broadcast(Map(
                                              "metadataBrokerList_OutputQueue" -> metadataBrokerList_OutputQueue,
                                              "topicOutput" -> topicOutput))

   val spec = StateSpec.function(Utils.updateState _).timeout(Minutes(2))
   val latestSessionInfo = membersSessions.map[(String, (Long, Long, Long, List[String]))](a => {
      //transform to (member_id, (time, time, counter, events within session))
          (a._1, (a._2._1, a._2._2, 1, a._2._3))
   })

   val finishedSessions = latestSessionInfo.mapWithState(spec).filter(_.isDefined)

    finishedSessions.foreachRDD( rdd => {
      rdd.foreachPartition{iter =>
        val producer = Utils.createProducer(settings.value("metadataBrokerList_OutputQueue"))
        iter.foreach { msg =>
          val jsonString = msg.get._4.toString()
          val streamEvent = new ProducerRecord[String, String](settings.value("topicOutput"), null, jsonString)
          producer.send(streamEvent)
        }
        producer.close()
      }
    })

This is my serializable object Utils:

object Utils extends Serializable {

  def updateState(key: String,
                  value: Option[(Long, Long, Long, List[String])],
                  state: State[(Long, Long, Long, List[String])]): Option[(Long, Long, Long, List[String])] = {
    def reduce(first: (Long, Long, Long, List[String]), second: (Long, Long, Long, List[String])) = {
      (Math.min(first._1, second._1), Math.max(first._2, second._2), first._3 + second._3, first._4 ++ second._4)
    }

    value match {
      case Some(currentValue) =>
        val result = state
          .getOption()
          .map(currentState => reduce(currentState, currentValue))
          .getOrElse(currentValue)
        state.update(result)
        None
      case _ if state.isTimingOut() => state.getOption()
    }
  }

  def createProducer(metadataBrokerList: String): KafkaProducer[String, String] = {

    val  kafkaProps = new Properties()

    println("metadataBrokerList: " + metadataBrokerList)

    kafkaProps.put("bootstrap.servers", metadataBrokerList)

    // This is mandatory, even though we don't send key
    kafkaProps.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer")
    kafkaProps.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer")
    kafkaProps.put("acks", "1")

    // how many times to retry when produce request fails?
    kafkaProps.put("retries", "3")
    // This is an upper limit of how many messages Kafka Producer will attempt to batch before sending (bytes)
    kafkaProps.put("batch.size", "5")
    // How long will the producer wait before sending in order to allow more messages to get accumulated in the same batch
    kafkaProps.put("linger.ms", "5")

    new KafkaProducer[String, String](kafkaProps)
  }
}

EDIT:

After defining settings as a local Broadcast variable (val settings = ... instead of settings), the error message disappeared. However, the new error appeared and it's related to Caused by: java.lang.ClassNotFoundException: java.time.temporal.TemporalField. What does it mean?:

Driver stacktrace:

org.apache.spark.SparkException: Job aborted due to stage failure: Task 6 in stage 3.0 failed 4 times, most recent failure: Lost task 6.3 in stage 3.0 (TID 34, ip-172-20-233-19.eu-west-1.compute.internal): java.lang.NoClassDefFoundError: java/time/temporal/TemporalField
    at org.testconsumer.kafka.KafkaEventsConsumer$$anonfun$run$2$$anonfun$apply$1.apply(KafkaEventsConsumer.scala:163)
    at org.testconsumer.kafka.KafkaEventsConsumer$$anonfun$run$2$$anonfun$apply$1.apply(KafkaEventsConsumer.scala:160)
    at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$33.apply(RDD.scala:920)
    at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$33.apply(RDD.scala:920)
    at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
    at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
    at org.apache.spark.scheduler.Task.run(Task.scala:89)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: java.time.temporal.TemporalField
    at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
    ... 12 more

Driver stacktrace:
    at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1431)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1419)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1418)
    at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
    at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1418)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
    at scala.Option.foreach(Option.scala:236)
    at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:799)
    at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1640)
    at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1599)
    at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1588)
    at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
    at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:620)
    at org.apache.spark.SparkContext.runJob(SparkContext.scala:1832)
    at org.apache.spark.SparkContext.runJob(SparkContext.scala:1845)
    at org.apache.spark.SparkContext.runJob(SparkContext.scala:1858)
    at org.apache.spark.SparkContext.runJob(SparkContext.scala:1929)
    at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.apply(RDD.scala:920)
    at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.apply(RDD.scala:918)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
    at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
    at org.apache.spark.rdd.RDD.foreachPartition(RDD.scala:918)
    at org.testconsumer.kafka.KafkaEventsConsumer$$anonfun$run$2.apply(KafkaEventsConsumer.scala:160)
    at org.testconsumer.kafka.KafkaEventsConsumer$$anonfun$run$2.apply(KafkaEventsConsumer.scala:159)
    at org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1$$anonfun$apply$mcV$sp$3.apply(DStream.scala:661)
    at org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1$$anonfun$apply$mcV$sp$3.apply(DStream.scala:661)
    at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ForEachDStream.scala:50)
    at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:50)
    at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:50)
    at org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:426)
    at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply$mcV$sp(ForEachDStream.scala:49)
    at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:49)
    at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:49)
    at scala.util.Try$.apply(Try.scala:161)
    at org.apache.spark.streaming.scheduler.Job.run(Job.scala:39)
    at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply$mcV$sp(JobScheduler.scala:224)
    at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:224)
    at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:224)
    at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
    at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler.run(JobScheduler.scala:223)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NoClassDefFoundError: java/time/temporal/TemporalField
    at org.testconsumer.kafka.KafkaEventsConsumer$$anonfun$run$2$$anonfun$apply$1.apply(KafkaEventsConsumer.scala:163)
    at org.testconsumer.kafka.KafkaEventsConsumer$$anonfun$run$2$$anonfun$apply$1.apply(KafkaEventsConsumer.scala:160)
    at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$33.apply(RDD.scala:920)
    at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$33.apply(RDD.scala:920)
    at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
    at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
    at org.apache.spark.scheduler.Task.run(Task.scala:89)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
    ... 3 more
Caused by: java.lang.ClassNotFoundException: java.time.temporal.TemporalField
    at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
    ... 12 more
Exception in thread "streaming-job-executor-0" java.lang.Error: java.lang.InterruptedException
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1151)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.InterruptedException
    at java.lang.Object.wait(Native Method)
    at java.lang.Object.wait(Object.java:503)
    at org.apache.spark.scheduler.JobWaiter.awaitResult(JobWaiter.scala:73)
    at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:612)
    at org.apache.spark.SparkContext.runJob(SparkContext.scala:1832)
    at org.apache.spark.SparkContext.runJob(SparkContext.scala:1845)
    at org.apache.spark.SparkContext.runJob(SparkContext.scala:1858)
    at org.apache.spark.SparkContext.runJob(SparkContext.scala:1929)
    at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.apply(RDD.scala:920)
    at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.apply(RDD.scala:918)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
    at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
    at org.apache.spark.rdd.RDD.foreachPartition(RDD.scala:918)
    at org.testconsumer.kafka.KafkaEventsConsumer$$anonfun$run$2.apply(KafkaEventsConsumer.scala:160)
    at org.testconsumer.kafka.KafkaEventsConsumer$$anonfun$run$2.apply(KafkaEventsConsumer.scala:159)
    at org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1$$anonfun$apply$mcV$sp$3.apply(DStream.scala:661)
    at org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1$$anonfun$apply$mcV$sp$3.apply(DStream.scala:661)
    at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ForEachDStream.scala:50)
    at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:50)
    at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:50)
    at org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:426)
    at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply$mcV$sp(ForEachDStream.scala:49)
    at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:49)
    at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:49)
    at scala.util.Try$.apply(Try.scala:161)
    at org.apache.spark.streaming.scheduler.Job.run(Job.scala:39)
    at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply$mcV$sp(JobScheduler.scala:224)
    at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:224)
    at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:224)
    at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
    at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler.run(JobScheduler.scala:223)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    ... 2 more
16/11/28 18:42:08 ERROR LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerStageCompleted(org.apache.spark.scheduler.StageInfo@764e333a)
16/11/28 18:42:08 ERROR LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerJobEnd(1,1480354928309,JobFailed(org.apache.spark.SparkException: Job 1 cancelled because SparkContext was shut down))

It seems to be related to the version of Java. I have the following version on the cluster, but then how can I deploy my code and how to avoid the erro with java.time.temporal.TemporalField?:

java version "1.7.0_111"

EDIT #2 (POM.xml):

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>org.test</groupId>
    <artifactId>streaming_test</artifactId>
    <version>1.0-SNAPSHOT</version>
    <properties>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
        <project.reporting.outputEncoding>UTF-8</project.reporting.outputEncoding>

        <java.version>1.7</java.version>
        <scala.version>2.10.6</scala.version>
        <spark.version>1.6.2</spark.version>
        <jackson.version>2.8.3</jackson.version>
    </properties>

    <dependencies>
        <dependency>
            <groupId>org.scala-lang</groupId>
            <artifactId>scala-library</artifactId>
            <version>${scala.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-streaming-kafka_2.10</artifactId>
            <version>${spark.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-sql_2.10</artifactId>
            <version>${spark.version}</version>
        </dependency>
        <!--<dependency>-->
        <!--<groupId>org.apache.spark</groupId>-->
        <!--<artifactId>spark-core_2.10</artifactId>-->
        <!--<version>${spark.version}</version>-->
        <!--</dependency>-->
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-streaming_2.10</artifactId>
            <version>${spark.version}</version>
        </dependency>
        <!--<dependency>-->
        <!--<groupId>org.apache.spark</groupId>-->
        <!--<artifactId>spark-mllib_2.10</artifactId>-->
        <!--<version>${spark.version}</version>-->
        <!--</dependency>-->
        <dependency>
            <groupId>com.fasterxml.jackson.module</groupId>
            <artifactId>jackson-module-scala_2.10</artifactId>
            <version>${jackson.version}</version>
        </dependency>
        <dependency>
            <groupId>com.fasterxml.jackson.core</groupId>
            <artifactId>jackson-databind</artifactId>
            <version>${jackson.version}</version>
        </dependency>
        <dependency>
            <groupId>com.fasterxml.jackson.core</groupId>
            <artifactId>jackson-annotations</artifactId>
            <version>${jackson.version}</version>
        </dependency>
        <dependency>
            <groupId>com.fasterxml.jackson.core</groupId>
            <artifactId>jackson-core</artifactId>
            <version>${jackson.version}</version>
        </dependency>
        <dependency>
            <groupId>net.debasishg</groupId>
            <artifactId>redisclient_2.10</artifactId>
            <version>3.3</version>
        </dependency>
        <dependency>
            <groupId>org.sedis</groupId>
            <artifactId>sedis_2.10</artifactId>
            <version>1.2.2</version>
        </dependency>
        <dependency>
            <groupId>com.lambdaworks</groupId>
            <artifactId>jacks_2.10</artifactId>
            <version>2.3.3</version>
        </dependency>
        <dependency>
            <groupId>com.typesafe</groupId>
            <artifactId>config</artifactId>
            <version>1.3.1</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-aws</artifactId>
            <version>2.6.0</version>
        </dependency>
        <dependency>
            <groupId>com.amazonaws</groupId>
            <artifactId>aws-java-sdk-s3</artifactId>
            <version>1.11.53</version>
        </dependency>
        <dependency>
            <groupId>org.apache.httpcomponents</groupId>
            <artifactId>httpclient</artifactId>
            <version>4.5.2</version>
        </dependency>
        <dependency>
            <groupId>net.java.dev.jets3t</groupId>
            <artifactId>jets3t</artifactId>
            <version>0.9.4</version>
        </dependency>
        <!--<dependency>-->
            <!--<groupId>com.github.nscala-time</groupId>-->
            <!--<artifactId>nscala-time_2.10</artifactId>-->
            <!--<version>2.12.0</version>-->
        <!--</dependency>-->
        <dependency>
            <groupId>com.databricks</groupId>
            <artifactId>spark-csv_2.10</artifactId>
            <version>1.5.0</version>
        </dependency>
    </dependencies>


    <build>
        <plugins>
            <plugin>
                <artifactId>maven-assembly-plugin</artifactId>
                <executions>
                    <execution>
                        <id>build-a</id>
                        <configuration>
                            <archive>
                                <manifest>
                                    <mainClass>org.test.consumer.SessionizerRunner</mainClass>
                                </manifest>
                            </archive>
                            <descriptorRefs>
                                <descriptorRef>jar-with-dependencies</descriptorRef>
                            </descriptorRefs>
                            <finalName>myTest</finalName>
                        </configuration>
                        <phase>package</phase>
                        <goals>
                            <goal>assembly</goal>
                        </goals>
                    </execution>
                </executions>
            </plugin>

            <!-- Configure maven-compiler-plugin to use the desired Java version -->
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-compiler-plugin</artifactId>
                <version>3.1</version>
                <configuration>
                    <source>${java.version}</source>
                    <target>${java.version}</target>
                </configuration>
            </plugin>

            <!-- Use build-helper-maven-plugin to add Scala source and test source directories -->
            <plugin>
                <groupId>org.codehaus.mojo</groupId>
                <artifactId>build-helper-maven-plugin</artifactId>
                <version>1.10</version>
                <executions>
                    <execution>
                        <id>add-source</id>
                        <phase>generate-sources</phase>
                        <goals>
                            <goal>add-source</goal>
                        </goals>
                        <configuration>
                            <sources>
                                <source>src/main/scala</source>
                            </sources>
                        </configuration>
                    </execution>
                    <execution>
                        <id>add-test-source</id>
                        <phase>generate-test-sources</phase>
                        <goals>
                            <goal>add-test-source</goal>
                        </goals>
                        <configuration>
                            <sources>
                                <source>src/test/scala</source>
                            </sources>
                        </configuration>
                    </execution>
                </executions>
            </plugin>

            <!-- Use scala-maven-plugin for Scala support -->
            <plugin>
                <groupId>net.alchim31.maven</groupId>
                <artifactId>scala-maven-plugin</artifactId>
                <version>3.2.2</version>
                <executions>
                    <execution>
                        <goals>
                            <!-- Need to specify this explicitly, otherwise plugin won't be called when doing e.g. mvn compile -->
                            <goal>compile</goal>
                            <goal>testCompile</goal>
                        </goals>
                    </execution>
                </executions>
            </plugin>
        </plugins>
    </build>

</project>
Community
  • 1
  • 1
duckertito
  • 3,365
  • 2
  • 18
  • 25

2 Answers2

1

It seems to be related to the version of Java. I have the following version on the cluster, but then how can I deploy my code and how to avoid the erro with java.time.temporal.TemporalField?:

java.time was added in Java 8. Search for any uses of java.time. in your code. Current Spark version requires Java 7 and Kafka too, so they shouldn't be the problem (but check any other libraries you use).

Alexey Romanov
  • 167,066
  • 35
  • 309
  • 487
  • Thank you so much. I have this import `import com.github.nscala_time.time.Imports._`. May it cause the problem? I can post POM.xml if it can help. – duckertito Nov 28 '16 at 19:09
  • ` com.github.nscala-time nscala-time_2.10 2.12.0 ` – duckertito Nov 28 '16 at 19:11
  • No, `nscala-time` uses Joda-Time, which is compatible with Java 7. OTOH, if you use a Spark version which runs on Scala 2.11, you should change this to `nscala-time_2.11` as well (and any other `_2.10` dependencies). – Alexey Romanov Nov 28 '16 at 19:16
  • No, I use scala 2.10. I posted my POM.xml. Indeed I really found `import java.time.temporal.ChronoField` in my imports list, however it appears as unused link. Why then does it trigger the error if I am not using it? – duckertito Nov 28 '16 at 19:22
  • At least Typesafe Config 1.3 requires Java 8. I don't know about other dependencies in your list. You can try to use http://depfind.sourceforge.net/ to look for other uses. – Alexey Romanov Nov 28 '16 at 19:31
  • I assume that the problem was related to `java.time.LocalDate.now.get(ChronoField.YEAR) - dt.getYear` used inside `Utils`. Though I was not calling the function that uses this line of code, it still had to compile and was unable to find `java.time.ChronoField`. Could it be the reason? – duckertito Nov 28 '16 at 19:33
  • Yes, loading `Utils` requires loading all classes it uses. – Alexey Romanov Nov 28 '16 at 19:42
  • Ok, tomorrow I will check if it was the reason of the error. I will accept the answer if it's so. Thank you! – duckertito Nov 28 '16 at 20:19
  • Note that fixing this won't mean there are no problems remaining (you can have the same problem with use of Config, as mentioned), it's possible your current error happens even before JVM tries to load `Utils`. – Alexey Romanov Nov 28 '16 at 20:29
  • I excluded both typesafe Config and java.time. So, let's see what happens. – duckertito Nov 28 '16 at 20:42
0

Caused by: java.io.NotSerializableException: org.testconsumer.kafka.KafkaEventsConsumer

Your error trace clearly shows what is the problem. KafkaProducer is not serialized so You have to annotate your kafka classes @transient or you can get help from this

Community
  • 1
  • 1
Balaji Reddy
  • 5,576
  • 3
  • 36
  • 47
  • I solved this problem by putting `settings` as a local broadcast variable. So, `val settings = ...` instead of `settings = ...`. However, now I get the new error and it looks like it's unrelated to serialization. Really don't understand what happens with my code, taking into account that I successfully deployed similar code pieces. Please see my update. – duckertito Nov 28 '16 at 17:53