Spark version = 2.3.0
Kafka version = 1.0.0
Sinppet of code being used:
# Kafka Enpoints
zkQuorum = '192.168.2.10:2181,192.168.2.12:2181'
topic = 'Test_topic'
# Create a kafka Stream
kafkaStream = KafkaUtils.createStream(ssc, zkQuorum, "cyd-demo-azureactivity-streaming-consumer", {topic: 1})
When the Kafka stream is run real time, I see spark pulling data, however if I start Kafka say an hour before Spark, it will not pick up the hour old data.
Is this expected or is there a way to set something up in a configuration?
Code run using:
sudo $SPARK_HOME/spark-submit --master local[2] --jars /home/steven/jars/elasticsearch-hadoop-6.3.2.jar,/home/steven/jars/spark-streaming-kafka-0-8-assembly_2.11-2.3.1.jar /home/steven/code/demo/test.py