0

I can't figure out how to obtain the data from a Pykafka consumer. I have the issue even print the topics from the consumer. The issue is that whatever method i call on the consumer the process is hanging forever. If i just initialize the consumer without using it the process finishes. Thank you for any help in advance.

def getData(spark):
    spark.sparkContext.setLogLevel("WARN")
    scc = StreamingContext(spark, 1)
    topic = "justtopic"
    client = pykafka.KafkaClient("localhost:9092")      
    KAFKA_VERSION = (0, 10)
    print("topics", client.topics)                        <-- this line is working

    consumer = KafkaConsumer(
        'justtopic', bootstrap_servers = 'localhost:9092',
        api_version = KAFKA_VERSION
    )

    print(consumer.topics())                         <-- if i call some function on consumer it hangs forever.
    #rdd = kafkaStream.flatMap(lambda line: line.strip().split("\n")).map(lambda strelem: float(strelem))
    # print("****** ", rdd.count())

  • Not sure to understand your code, why you're not instantiating your consumer using your pykafka client, like sometopic.get_simple_consumer()? – Yannick Jul 24 '19 at 20:23
  • @Yannick Because if you don't specify kafka version it throws error. –  Jul 25 '19 at 07:38
  • Do you have any warning messages or log relevant from Kafka (partition assignment, etc..) ? if you run a simple-console-consumer (kafka binaries), does it work fine for this topic ? – Yannick Jul 25 '19 at 09:43
  • @Yannick Yes. It is running just fine when i run the topic from the terminal with the consumer.sh and i don't get any error messages. From spark in the moment that i invoke some method on the consumer object it hangs forever that includes all possible methods. Also there is no error in the log files. –  Jul 25 '19 at 10:22
  • 1
    You have tagged this question with spark and pyspark. Is there a reason why you want to use pykafka when spark is able to receive kafka messages via structured streaming with [package](https://mvnrepository.com/artifact/org.apache.spark/spark-sql-kafka-0-10_2.11/2.4.0). You can find a beginner section [here](https://spark.apache.org/docs/2.2.0/structured-streaming-kafka-integration.html). – cronoik Jul 25 '19 at 14:56
  • @cronoik Thank you. –  Jul 25 '19 at 15:33

0 Answers0