Apache Spark with kafka stream - Missing Kafka

Question

I have trying to setup the Apache Spark with kafka and wrote simple program in local and its failing and not able figure out from debug.

build.gradle.kts

implementation ("org.jetbrains.kotlin:kotlin-stdlib:1.4.0")
implementation ("org.jetbrains.kotlinx.spark:kotlin-spark-api-3.0.0_2.12:1.0.0-preview1")
compileOnly("org.apache.spark:spark-sql_2.12:3.0.0")
implementation("org.apache.kafka:kafka-clients:3.0.0")

Main function code is

val spark = SparkSession
    .builder()
    .master("local[*]")
    .appName("Ship metrics").orCreate

        val shipmentDataFrame = spark
            .readStream()
            .format("kafka")
            .option("kafka.bootstrap.servers", "localhost:9092")
            .option("subscribe", "test")
            .option("includeHeaders", "true")
            .load()

      val query =  shipmentDataFrame.selectExpr("CAST(key AS STRING)", "CAST(value AS STRING)")

        query.writeStream()
            .format("console")
            .outputMode("append")
            .start()
            .awaitTermination()

and getting error :

Exception in thread "main" org.apache.spark.sql.AnalysisException: Failed to find data source: kafka. Please deploy the application as per the deployment section of "Structured Streaming + Kafka Integration Guide".;
    at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:666)
    at org.apache.spark.sql.streaming.DataStreamReader.load(DataStreamReader.scala:194)
    at com.tgt.ff.axon.shipmetriics.stream.ShipmentStream.run(ShipmentStream.kt:23)
    at com.tgt.ff.axon.shipmetriics.ApplicationKt.main(Application.kt:12)
21/12/25 22:22:56 INFO SparkContext: Invoking stop() from shutdown hook

Please, consider marking an answer as correct if it works for you to make community aware or add a comment to the answer if something is wrong. — asm0dey, Jun 22 '22 at 10:35

score 1 · Answer 1 · answered May 31 '22 at 16:02

The Kotlin API for Spark by JetBrains (https://github.com/Kotlin/kotlin-spark-api) has support for streaming since the 1.1.0 update. There is also an example with a Kafka stream which might be of help to you: https://github.com/Kotlin/kotlin-spark-api/blob/spark-3.2/examples/src/main/kotlin/org/jetbrains/kotlinx/spark/examples/streaming/KotlinDirectKafkaWordCount.kt

It does use the Spark DStream API instead of the Spark Structured Streaming API you appear to be using.

You can, of course, also still use the structured streaming one, if you prefer that, but then it needs to be deployed like is described here.

Apache Spark with kafka stream - Missing Kafka

1 Answers1