I can't figure out how to obtain the data from a Pykafka consumer. I have the issue even print the topics from the consumer. The issue is that whatever method i call on the consumer the process is hanging forever. If i just initialize the consumer without using it the process finishes. Thank you for any help in advance.
def getData(spark):
spark.sparkContext.setLogLevel("WARN")
scc = StreamingContext(spark, 1)
topic = "justtopic"
client = pykafka.KafkaClient("localhost:9092")
KAFKA_VERSION = (0, 10)
print("topics", client.topics) <-- this line is working
consumer = KafkaConsumer(
'justtopic', bootstrap_servers = 'localhost:9092',
api_version = KAFKA_VERSION
)
print(consumer.topics()) <-- if i call some function on consumer it hangs forever.
#rdd = kafkaStream.flatMap(lambda line: line.strip().split("\n")).map(lambda strelem: float(strelem))
# print("****** ", rdd.count())