0

I'm using kafka and spark streaming for a project programmed in python. I want to send data from kafka producer to my streaming program. It's working smoothly when i execute the following command with the dependencies specified:

./spark-submit --packages org.apache.spark:spark-streaming-kafka-0-8_2.11:2.1.0 ./kafkastreaming.py

Is there any way where i can specify the dependencies and run the streaming code directly(i.e. without using spark-submit or with using spark-submit but not specifying the dependencies.)

I tried specifying the dependencies in the spark-defaults.conf in the conf dir of spark. The specified dependencies were: 1.org.apache.spark:spark-streaming-kafka-0-8_2.11:2.1.0 2.org.apache.spark:spark-streaming-kafka-0-8-assembly:2.1.1

NOTE - I referred to spark streaming guide using netcat from https://spark.apache.org/docs/latest/streaming-programming-guide.html and it worked without using spark-submit command hence i want to know if i can do the same with kafka and spark streaming.

Ruslan Ostafiichuk
  • 4,422
  • 6
  • 30
  • 35
Akhilesh
  • 83
  • 10

1 Answers1

0

Provide your additional dependencies into the "jars" folder of your spark distribution. stop and start spark again. This way, dependencies wil be resolved at runtime without adding any additional option in your command line

Maximilien Belinga
  • 3,076
  • 2
  • 25
  • 39
  • Hi! I added the dependency "spark-streaming-kafka-0-8_2.11-2.1.0.jar" and "spark-streaming-kafka-0-8-assembly_2.10-2.1.1.jar" to "jars" folder of spark and executed spark-submit without the "--packages" option, it gives an error saying that it can't find those dependencies. – Akhilesh Jun 28 '17 at 07:20