I'm trying to run word count example integrating AWS Kinesis stream and Apache Spark. Random lines are put in Kinesis at regular intervals.
lines = KinesisUtils.createStream(...)
When I submit my application, lines.pprint()
I don't see any values printed.
Tried to print the lines
object and I see <pyspark.streaming.dstream.TransformedDStream object at 0x7fa235724950>
How to print the PythonTransformedDStream
object? and check if the data is received.
I'm sure there is no credentials issue, if I use false credentials I get access exception.
Added the code for reference
import sys
from pyspark import SparkContext
from pyspark.streaming import StreamingContext
from pyspark.streaming.kinesis import KinesisUtils, InitialPositionInStream
if __name__ == "__main__":
sc = SparkContext(appName="SparkKinesisApp")
ssc = StreamingContext(sc, 1)
lines = KinesisUtils.createStream(ssc, "SparkKinesisApp", "myStream", "https://kinesis.us-east-1.amazonaws.com","us-east-1", InitialPositionInStream.LATEST, 2)
# lines.saveAsTextFiles('/home/ubuntu/logs/out.txt')
lines.pprint()
counts = lines.flatMap(lambda line: line.split(" "))
.map(lambda word: (word, 1))
.reduceByKey(lambda a, b: a + b)
counts.pprint()
ssc.start()
ssc.awaitTermination()