I want to streaming Twitter data using Kafka, and doing sentiment analysis with Spark. The producer is working well, it can retrieve the data from the Twitter API to the Kafka Topics, but i got an error in the Spark as a consumer.
Below is the code for Spark Session with the packages referenced from the documentation.
# Config
spark = SparkSession \
.builder \
.master("local[*]") \
.appName("TwitterSentimentAnalysis") \
.config("spark.jars.packages", "org.apache.spark:spark-sql-kafka-0-10_2.12:3.4.0") \
.getOrCreate()
But when i try to read the data from the Kafka Topic
df = spark \
.readStream \
.format("kafka") \
.option("kafka.bootstrap.servers", "localhost:9092") \
.option("subscribe", "twitter") \
.option("startingOffsets", "latest") \
.option("header", "true") \
.load() \
.selectExpr("CAST(value AS STRING) as message")
it showing an error like this
raise Py4JJavaError( py4j.protocol.Py4JJavaError: An error occurred while calling o36.load. java.lang.NoSuchMethodError: 'scala.collection.mutable.WrappedArray scala.Predef$.wrapRefArray(java.lang.Object[])'
I already switch to another version, either Kafka, Spark, and PySpark, and still got the error. Trying to using Docker for the Spark and Kafka but still got the same error too.