I have a spark job which I'm running using the following command:
sudo ./bin/spark-submit --jars lib/spark-streaming-kafka-assembly_2.10-1.4.1.jar \
--packages TargetHolding:pyspark-cassandra:0.2.4 \
examples/src/main/python/final/kafka-sparkstreaming-cassandra.py
However, sometimes the job fails for some reason after it runs for 2 continuous days per say and I have to manually start it after that.
This is highly inefficient for my purpose because I am continuously reading data from kafka and saving it to cassandra.
What feature does spark have for such fault tolerance? Maybe spark-submit could be launch again? Maybe there's something smarter? I tried to google this but there is very little information regarding this.
P.S. - I am using spark 1.4.1
.
I hope to receive some good ideas!
Thanks!