Kafka streaming to Spark

Question

I want to streaming Twitter data using Kafka, and doing sentiment analysis with Spark. The producer is working well, it can retrieve the data from the Twitter API to the Kafka Topics, but i got an error in the Spark as a consumer.

Below is the code for Spark Session with the packages referenced from the documentation.

 # Config
    spark = SparkSession \
        .builder \
        .master("local[*]") \
        .appName("TwitterSentimentAnalysis") \
        .config("spark.jars.packages", "org.apache.spark:spark-sql-kafka-0-10_2.12:3.4.0") \
        .getOrCreate()

But when i try to read the data from the Kafka Topic

df = spark \
        .readStream \
        .format("kafka") \
        .option("kafka.bootstrap.servers", "localhost:9092") \
        .option("subscribe", "twitter") \
        .option("startingOffsets", "latest") \
        .option("header", "true") \
        .load() \
        .selectExpr("CAST(value AS STRING) as message")

it showing an error like this

raise Py4JJavaError( py4j.protocol.Py4JJavaError: An error occurred while calling o36.load. java.lang.NoSuchMethodError: 'scala.collection.mutable.WrappedArray scala.Predef$.wrapRefArray(java.lang.Object[])'

I already switch to another version, either Kafka, Spark, and PySpark, and still got the error. Trying to using Docker for the Spark and Kafka but still got the same error too.

What version of everything are you running? Make sure the `_2.12` part of your dependency is the same as Spark/Scala you're running — OneCricketeer, May 17 '23 at 18:00

Kafka streaming to Spark

0 Answers0