0

I'm trying to run KafkaWordCount example in spark streaming using Spark version 2.1.1 in standalone cluster mode. As the kafka version on the server that I'm trying to integrate with is 2.11-0.10.0.1 . According to https://spark.apache.org/docs/latest/streaming-kafka-integration.html there are two separate packages one for 0.8.2.1 or higher and another for 0.10.0 or higher.

I've added following jars to the jars folder within spark home :

kafka_2.11-0.10.0.1.jar spark-streaming-kafka-0-10-assembly_2.11-2.1.1.jar spark-streaming-kafka-0-10_2.11-2.1.1.jar

Running this command :

/usr/local/spark/bin/spark-submit --num-executors 1 --executor-memory 20G --total-executor-cores 4 --class org.apache.spark.examples.streaming.KafkaWordCount /usr/local/spark/examples/jars/spark-examples_2.11-2.1.1.jar 10.0.16.96:2181 group_test topic 6

shows Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/streaming/kafka/KafkaUtils$

Is there any other jar that I missed ?

logs :

    /usr/local/spark/bin/spark-submit --num-executors 1 --executor-memory 20G --total-executor-cores 4 --class org.apache.spark.examples.streaming.KafkaWordCount /usr/local/spark/examples/jars/spark-examples_2.11-2.1.1.jar 10.0.16.96:2181 group_test streams 6
Warning: Ignoring non-spark config property: fs.s3.awsAccessKeyId=AKIAIETFDAABYC23XVSQ
Warning: Ignoring non-spark config property: fs.s3.awsSecretAccessKey=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
17/07/11 08:04:31 INFO spark.SparkContext: Running Spark version 2.1.1
17/07/11 08:04:31 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/07/11 08:04:31 INFO spark.SecurityManager: Changing view acls to: mahendra
17/07/11 08:04:31 INFO spark.SecurityManager: Changing modify acls to: mahendra
17/07/11 08:04:31 INFO spark.SecurityManager: Changing view acls groups to:
17/07/11 08:04:31 INFO spark.SecurityManager: Changing modify acls groups to:
17/07/11 08:04:31 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(mahendra); groups with view permissions: Set(); users  with modify permissions: Set(mahendra); groups with modify permissions: Set()
17/07/11 08:04:32 INFO util.Utils: Successfully started service 'sparkDriver' on port 38173.
17/07/11 08:04:32 INFO spark.SparkEnv: Registering MapOutputTracker
17/07/11 08:04:32 INFO spark.SparkEnv: Registering BlockManagerMaster
17/07/11 08:04:32 INFO storage.BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
17/07/11 08:04:32 INFO storage.BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
17/07/11 08:04:32 INFO storage.DiskBlockManager: Created local directory at /tmp/blockmgr-241eda29-1cb3-4364-859c-79ba86689fbf
17/07/11 08:04:32 INFO memory.MemoryStore: MemoryStore started with capacity 5.2 GB
17/07/11 08:04:32 INFO spark.SparkEnv: Registering OutputCommitCoordinator
17/07/11 08:04:32 INFO util.log: Logging initialized @1581ms
17/07/11 08:04:32 INFO server.Server: jetty-9.2.z-SNAPSHOT
17/07/11 08:04:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@a7e2d9d{/jobs,null,AVAILABLE,@Spark}
17/07/11 08:04:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@754777cd{/jobs/json,null,AVAILABLE,@Spark}
17/07/11 08:04:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@2b52c0d6{/jobs/job,null,AVAILABLE,@Spark}
17/07/11 08:04:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@372ea2bc{/jobs/job/json,null,AVAILABLE,@Spark}
17/07/11 08:04:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@4cc76301{/stages,null,AVAILABLE,@Spark}
17/07/11 08:04:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@2f08c4b{/stages/json,null,AVAILABLE,@Spark}
17/07/11 08:04:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@3f19b8b3{/stages/stage,null,AVAILABLE,@Spark}
17/07/11 08:04:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@7de0c6ae{/stages/stage/json,null,AVAILABLE,@Spark}
17/07/11 08:04:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@a486d78{/stages/pool,null,AVAILABLE,@Spark}
17/07/11 08:04:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@cdc3aae{/stages/pool/json,null,AVAILABLE,@Spark}
17/07/11 08:04:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@7ef2d7a6{/storage,null,AVAILABLE,@Spark}
17/07/11 08:04:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@5dcbb60{/storage/json,null,AVAILABLE,@Spark}
17/07/11 08:04:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@4c36250e{/storage/rdd,null,AVAILABLE,@Spark}
17/07/11 08:04:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@21526f6c{/storage/rdd/json,null,AVAILABLE,@Spark}
17/07/11 08:04:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@49f5c307{/environment,null,AVAILABLE,@Spark}
17/07/11 08:04:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@299266e2{/environment/json,null,AVAILABLE,@Spark}
17/07/11 08:04:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@5471388b{/executors,null,AVAILABLE,@Spark}
17/07/11 08:04:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@66ea1466{/executors/json,null,AVAILABLE,@Spark}
17/07/11 08:04:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@1601e47{/executors/threadDump,null,AVAILABLE,@Spark}
17/07/11 08:04:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@3bffddff{/executors/threadDump/json,null,AVAILABLE,@Spark}
17/07/11 08:04:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@66971f6b{/static,null,AVAILABLE,@Spark}
17/07/11 08:04:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@50687efb{/,null,AVAILABLE,@Spark}
17/07/11 08:04:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@517bd097{/api,null,AVAILABLE,@Spark}
17/07/11 08:04:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@142eef62{/jobs/job/kill,null,AVAILABLE,@Spark}
17/07/11 08:04:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@4a9cc6cb{/stages/stage/kill,null,AVAILABLE,@Spark}
17/07/11 08:04:32 INFO server.ServerConnector: Started Spark@6de54b40{HTTP/1.1}{0.0.0.0:4040}
17/07/11 08:04:32 INFO server.Server: Started @1696ms
17/07/11 08:04:32 INFO util.Utils: Successfully started service 'SparkUI' on port 4040.
17/07/11 08:04:32 INFO ui.SparkUI: Bound SparkUI to 0.0.0.0, and started at http://10.0.16.15:4040
17/07/11 08:04:32 INFO spark.SparkContext: Added JAR file:/usr/local/spark/examples/jars/spark-examples_2.11-2.1.1.jar at spark://10.0.16.15:38173/jars/spark-examples_2.11-2.1.1.jar with timestamp 1499760272476
17/07/11 08:04:32 INFO client.StandaloneAppClient$ClientEndpoint: Connecting to master spark://ip-10-0-16-15.ap-southeast-1.compute.internal:7077...
17/07/11 08:04:32 INFO client.TransportClientFactory: Successfully created connection to ip-10-0-16-15.ap-southeast-1.compute.internal/10.0.16.15:7077 after 27 ms (0 ms spent in bootstraps)
17/07/11 08:04:32 INFO cluster.StandaloneSchedulerBackend: Connected to Spark cluster with app ID app-20170711080432-0038
17/07/11 08:04:32 INFO client.StandaloneAppClient$ClientEndpoint: Executor added: app-20170711080432-0038/0 on worker-20170707101056-10.0.16.51-40051 (10.0.16.51:40051) with 4 cores
17/07/11 08:04:32 INFO cluster.StandaloneSchedulerBackend: Granted executor ID app-20170711080432-0038/0 on hostPort 10.0.16.51:40051 with 4 cores, 20.0 GB RAM
17/07/11 08:04:32 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 35723.
17/07/11 08:04:32 INFO netty.NettyBlockTransferService: Server created on 10.0.16.15:35723
17/07/11 08:04:32 INFO storage.BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
17/07/11 08:04:32 INFO storage.BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 10.0.16.15, 35723, None)
17/07/11 08:04:32 INFO client.StandaloneAppClient$ClientEndpoint: Executor updated: app-20170711080432-0038/0 is now RUNNING
17/07/11 08:04:32 INFO storage.BlockManagerMasterEndpoint: Registering block manager 10.0.16.15:35723 with 5.2 GB RAM, BlockManagerId(driver, 10.0.16.15, 35723, None)
17/07/11 08:04:32 INFO storage.BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 10.0.16.15, 35723, None)
17/07/11 08:04:32 INFO storage.BlockManager: Initialized BlockManager: BlockManagerId(driver, 10.0.16.15, 35723, None)
17/07/11 08:04:32 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@34448e6c{/metrics/json,null,AVAILABLE,@Spark}
17/07/11 08:04:32 INFO cluster.StandaloneSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
17/07/11 08:04:33 WARN fs.FileSystem: Cannot load filesystem
java.util.ServiceConfigurationError: org.apache.hadoop.fs.FileSystem: Provider org.apache.hadoop.fs.s3a.S3AFileSystem could not be instantiated
    at java.util.ServiceLoader.fail(ServiceLoader.java:232)
    at java.util.ServiceLoader.access$100(ServiceLoader.java:185)
    at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:384)
    at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404)
    at java.util.ServiceLoader$1.next(ServiceLoader.java:480)
    at org.apache.hadoop.fs.FileSystem.loadFileSystems(FileSystem.java:2631)
    at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2650)
    at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2667)
    at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94)
    at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703)
    at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:172)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:357)
    at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
    at org.apache.spark.streaming.StreamingContext.checkpoint(StreamingContext.scala:238)
    at org.apache.spark.examples.streaming.KafkaWordCount$.main(KafkaWordCount.scala:54)
    at org.apache.spark.examples.streaming.KafkaWordCount.main(KafkaWordCount.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:743)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.NoClassDefFoundError: org/apache/hadoop/fs/StorageStatistics
    at java.lang.Class.getDeclaredConstructors0(Native Method)
    at java.lang.Class.privateGetDeclaredConstructors(Class.java:2671)
    at java.lang.Class.getConstructor0(Class.java:3075)
    at java.lang.Class.newInstance(Class.java:412)
    at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:380)
    ... 24 more
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.fs.StorageStatistics
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    ... 29 more
17/07/11 08:04:33 WARN spark.SparkContext: Spark is not running in local mode, therefore the checkpoint directory must not be on the local filesystem. Directory 'file:/home/mahendra/checkpoint' appears to be on the local filesystem.
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/streaming/kafka/KafkaUtils$
    at org.apache.spark.examples.streaming.KafkaWordCount$.main(KafkaWordCount.scala:57)
    at org.apache.spark.examples.streaming.KafkaWordCount.main(KafkaWordCount.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:743)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.streaming.kafka.KafkaUtils$
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    ... 11 more
17/07/11 08:04:33 INFO spark.SparkContext: Invoking stop() from shutdown hook
17/07/11 08:04:33 INFO server.ServerConnector: Stopped Spark@6de54b40{HTTP/1.1}{0.0.0.0:4040}
17/07/11 08:04:33 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@4a9cc6cb{/stages/stage/kill,null,UNAVAILABLE,@Spark}
17/07/11 08:04:33 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@142eef62{/jobs/job/kill,null,UNAVAILABLE,@Spark}
17/07/11 08:04:33 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@517bd097{/api,null,UNAVAILABLE,@Spark}
17/07/11 08:04:33 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@50687efb{/,null,UNAVAILABLE,@Spark}
17/07/11 08:04:33 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@66971f6b{/static,null,UNAVAILABLE,@Spark}
17/07/11 08:04:33 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@3bffddff{/executors/threadDump/json,null,UNAVAILABLE,@Spark}
17/07/11 08:04:33 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@1601e47{/executors/threadDump,null,UNAVAILABLE,@Spark}
17/07/11 08:04:33 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@66ea1466{/executors/json,null,UNAVAILABLE,@Spark}
17/07/11 08:04:33 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@5471388b{/executors,null,UNAVAILABLE,@Spark}
17/07/11 08:04:33 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@299266e2{/environment/json,null,UNAVAILABLE,@Spark}
17/07/11 08:04:33 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@49f5c307{/environment,null,UNAVAILABLE,@Spark}
17/07/11 08:04:33 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@21526f6c{/storage/rdd/json,null,UNAVAILABLE,@Spark}
17/07/11 08:04:33 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@4c36250e{/storage/rdd,null,UNAVAILABLE,@Spark}
17/07/11 08:04:33 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@5dcbb60{/storage/json,null,UNAVAILABLE,@Spark}
17/07/11 08:04:33 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@7ef2d7a6{/storage,null,UNAVAILABLE,@Spark}
17/07/11 08:04:33 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@cdc3aae{/stages/pool/json,null,UNAVAILABLE,@Spark}
17/07/11 08:04:33 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@a486d78{/stages/pool,null,UNAVAILABLE,@Spark}
17/07/11 08:04:33 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@7de0c6ae{/stages/stage/json,null,UNAVAILABLE,@Spark}
17/07/11 08:04:33 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@3f19b8b3{/stages/stage,null,UNAVAILABLE,@Spark}
17/07/11 08:04:33 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@2f08c4b{/stages/json,null,UNAVAILABLE,@Spark}
17/07/11 08:04:33 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@4cc76301{/stages,null,UNAVAILABLE,@Spark}
17/07/11 08:04:33 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@372ea2bc{/jobs/job/json,null,UNAVAILABLE,@Spark}
17/07/11 08:04:33 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@2b52c0d6{/jobs/job,null,UNAVAILABLE,@Spark}
17/07/11 08:04:33 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@754777cd{/jobs/json,null,UNAVAILABLE,@Spark}
17/07/11 08:04:33 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@a7e2d9d{/jobs,null,UNAVAILABLE,@Spark}
17/07/11 08:04:33 INFO ui.SparkUI: Stopped Spark web UI at http://10.0.16.15:4040
17/07/11 08:04:33 INFO cluster.StandaloneSchedulerBackend: Shutting down all executors
17/07/11 08:04:33 INFO cluster.CoarseGrainedSchedulerBackend$DriverEndpoint: Asking each executor to shut down
17/07/11 08:04:33 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
17/07/11 08:04:33 INFO memory.MemoryStore: MemoryStore cleared
17/07/11 08:04:33 INFO storage.BlockManager: BlockManager stopped
17/07/11 08:04:33 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
17/07/11 08:04:33 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
17/07/11 08:04:33 INFO spark.SparkContext: Successfully stopped SparkContext
17/07/11 08:04:33 INFO util.ShutdownHookManager: Shutdown hook called
17/07/11 08:04:33 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-a7875c5c-cdfc-486e-bf7d-7fe0a7cff228

Thanks !

Ruslan Ostafiichuk
  • 4,422
  • 6
  • 30
  • 35
  • How are you creating the uber JAR for `spark-examples_2.11-2.1.1.jar`? Can you show us your build.sbt? – Yuval Itzchakov Jul 11 '17 at 08:27
  • @YuvalItzchakov I'm using the already created jar that is present at `SPARK_HOME/examples/jars/spark-examples_2.11-2.1.1.jar` . While doing other try in creating a jar myself, I added maven dependency (https://spark.apache.org/docs/latest/streaming-programming-guide.html#linking) in `SPARK_HOME/pom.xml` and ran `mvn package` and then used `SPARK_HOME/target/original-spark-examples_2.11-2.1.1.jar` but then also it shows same result. – Mahendra Singh Meena Jul 11 '17 at 08:38
  • I have not used maven or sbt before, I mostly use spark's python api, but unfortunately for kafka ver. 0.10 and higher python api for spark streaming are not available. If you can explain more on how to build these uber jars properly, then it would be really helpful. – Mahendra Singh Meena Jul 11 '17 at 08:39
  • I don't think `spark-examples_2.11-2.1.1.jar` contains the JAR for Kafka. You can read online how to use `sbt-assembly` when building a JAR with SBT. – Yuval Itzchakov Jul 11 '17 at 08:43

0 Answers0