0

the following is my python coding for a kafka producer, I'm not sure is the messages able to be published to a Kafka Broker or not. Because the consumer side is didn't receiving any messages. My Consumer python program is working fine while i testing it using producer console command.

from __future__ import print_function

import sys
from pyspark import SparkContext
from kafka import KafkaClient, SimpleProducer

if __name__ == "__main__":

if len(sys.argv) != 2:
    print("Usage:spark-submit producer1.py <input file>", file=sys.stderr)
    exit(-1)

sc = SparkContext(appName="PythonRegression")

def sendkafka(messages):
    ## Set broker port
    kafka = KafkaClient("localhost:9092")
    producer = SimpleProducer(kafka, async=True, batch_send_every_n=5,  
batch_send_every_t=10)
    send_counts = 0
    for message in messages:
        try:
            print(message)
            ## Set topic name and push messages to the Kafka Broker
            yield producer.send_messages('test', message.encode('utf-8'))
        except Exception, e:
            print("Error: %s" % str(e))
        else:
            send_counts += 1
    print("The count of prediction results which were sent IN THIS PARTITION 
is %d.\n" % send_counts)

## Connect and read the file.    
rawData = sc.textFile(sys.argv[1])

## Find and skip the first row
dataHeader = rawData.first()
data =  rawData.filter(lambda x: x != dataHeader)

## Collect the RDDs.
sentRDD = data.mapPartitions(sendkafka) 
sentRDD.collect()

## Stop file connection
sc.stop()

This is my "Consumer" python coding

from __future__ import print_function
import sys
from pyspark import SparkContext
from pyspark.streaming import StreamingContext
from pyspark.streaming.kafka import KafkaUtils

if len(sys.argv) < 3:
print ("Program to pulls the messages from kafka brokers.")
print("Usage: consume.py <zk> <topic>", file=sys.stderr)

else:
## Flow
## Loads settings from system properties, for launching of spark-submit.
sc = SparkContext(appName="PythonStreamingKafkaWordCount")

## Create a StreamingContext using an existing SparkContext.
ssc = StreamingContext(sc, 10)

## Get everything after the python script name
zkQuorum, topic = sys.argv[1:]

## Create an input stream that pulls messages from Kafka Brokers.
kvs = KafkaUtils.createStream(ssc, zkQuorum, "spark-streaming-consumer", 
{topic: 1})

## 
lines = kvs.map(lambda x: x[1])

## Print the messages pulled from Kakfa Brokers
lines.pprint()

## Save the pulled messages as file
## lines.saveAsTextFiles("OutputA")

## Start receiving data and processing it
ssc.start()

## Allows the current process to wait for the termination of the context 
ssc.awaitTermination()
不好笑
  • 105
  • 3
  • 14

3 Answers3

0

I usually debug such issues using kafka-console-consumer (part of Apache Kafka) to consume from the topic you tried producing to. If the console consumer gets messages, you know they arrived to Kafka.

If you first run the producer, let it finish, and then start the consumer, then the issue may be that the consumer is starting from the end of the log and is waiting for additional messages. Either make sure you are starting the consumer first, or configure it to automatically start at the beginning (sorry, not sure how to do that with your Python client).

Gwen Shapira
  • 4,978
  • 1
  • 24
  • 23
0

You can check the number of messages in the topic if they are increasing with Produce requests:

./bin/kafka-run-class.sh kafka.tools.GetOffsetShell \
--broker-list <Kafka_broker_hostname>:<broker_port> --topic Que1 \ 
--time -1 --offsets 1 | awk -F  ":" '{sum += $3} END {print sum}'

If the number of messages are increasing, then it means the Producer is working fine.

Rakesh Rakshit
  • 592
  • 2
  • 13
0

Alright I think there's something wrong with my local Zookeeper or Kafka, because I test it on another server it work perfectly. However , thanks for those who reply me ;)

不好笑
  • 105
  • 3
  • 14