I have the following spark job:
from __future__ import print_function
import os
import sys
import time
from random import random
from operator import add
from pyspark.streaming import StreamingContext
from pyspark import SparkContext,SparkConf
from pyspark.streaming.kafka import KafkaUtils
from pyspark.sql import SQLContext, Row
from pyspark.streaming import StreamingContext
from pyspark_cassandra import streaming,CassandraSparkContext
if __name__ == "__main__":
conf = SparkConf().setAppName("PySpark Cassandra Test")
sc = CassandraSparkContext(conf=conf)
stream = StreamingContext(sc, 2)
rdd=sc.cassandraTable("keyspace2","users").collect()
#print rdd
stream.start()
stream.awaitTermination()
sc.stop()
When I run this, it gives me the following error:
ERROR StreamingContext: Error starting the context, marking it as stopped
java.lang.IllegalArgumentException: requirement failed: \
No output operations registered, so nothing to execute
the shell script I run:
./bin/spark-submit --packages TargetHolding:pyspark-cassandra:0.2.4 example
s/src/main/python/test/reading-cassandra.py
Comparing spark streaming with kafka, I have this line missing from the above code:
kafkaStream = KafkaUtils.createStream(stream, 'localhost:2181', "name", {'topic':1})
where I'm actually using createStream
but for cassandra, I can't see anything like this on the docs. How do I start the streaming between spark streaming and cassandra?
Versions:
Cassandra v2.1.12
Spark v1.4.1
Scala 2.10