I am trying to create a Kafka Consumer which uses MongoDB-Spark-Connector in the same program. Something like Kafka input as RDD --> to Dataframe and then store it in the MongoDB for later use.
My Producer is up and running and the "standard" consumer looks like this and gets the messages nicely:
# Spark
from pyspark import SparkContext
# Spark Streaming
from pyspark.streaming import StreamingContext
# Kafka
from pyspark.streaming.kafka import KafkaUtils
# json parsing
import json
sc = SparkContext(appName="PythonSparkStreamingKafka_RM_01")
sc.setLogLevel("WARN")
ssc = StreamingContext(sc, 30)
kafkaStream = KafkaUtils.createStream(ssc, 'localhost:2181', 'spark-streaming-consumer', {'trump':1})
parsed = kafkaStream.map(lambda v: json.loads(v[1]))
parsed.pprint()
ssc.start()
ssc.awaitTermination()
The "modified" consumer I want to use, which is built through SparkSessionBuilder with the config options to use mongodb looks like this:
#Additional for Session Building and Preprocessing
from pyspark import SparkContext
from pyspark import SQLContext
from pyspark.sql import SparkSession
import collections
# Spark Streaming
from pyspark.streaming import StreamingContext
# Kafka
from pyspark.streaming.kafka import KafkaUtils
# json parsing
import json
# Build the SparkSession
spark = SparkSession.builder \
.master("local") \
.appName("TrumpTweets") \
.config("spark.executor.memory", "1gb") \
.config("spark.mongodb.input.uri", "mongodb://127.0.0.1/trumptweets.tweets") \
.config("spark.mongodb.output.uri", "mongodb://127.0.0.1/trumptweets.tweets") \
.getOrCreate()
ssc = StreamingContext(spark.sparkContext, 30)
kafkaStream = KafkaUtils.createStream(ssc, 'localhost:2181', 'spark-streaming-consumer', {'trump':1})
parsed = kafkaStream.map(lambda v: json.loads(v[1]))
parsed.pprint()
ssc.start()
ssc.awaitTermination()
it runs nicely, but does not receive any messages... I don't see anything different other than the SessionBuilder, which doesn't produce any error messages or so.
Please help me out, I'm really stuck on this one... Any other way is also appreciated!