0

I am using the following code to get the message from the kafka

scala code:

val lines: ReceiverInputDStream[(String, String)] = KafkaUtils.createStream(ssc,
 zookeeperQuorum, consumerGroup, topicMap)
lines.print(10)

Here is my sample producer code .

    from kafka import SimpleProducer, KafkaClient
    import time
    # To send messages synchronously
    kafka = KafkaClient(serverip+':'+port)
    producer = SimpleProducer(kafka)
    kafka.ensure_topic_exists('test')

    kafka.ensure_topic_exists('test1')

    while(1):
     print "sending message "
     producer.send_messages(b'test', 'test,msg')
     time.sleep(2)
     producer.send_messages(b'test1', 'test1,msg')
     time.sleep(2)

My streaming receiver prints

(null,'test,msg')
(null,'test1,msg')

Questions:

1) How can I differentiate msg per topic level without actually
decoding the message ?

2) Why it is giving me null in the output ? From the documentation
it says key,value tuple. How can I create key,value tuple kind of
message ?

EDIT: With keyedProducer

kafka = KafkaClient(serverip+':'+port)
producer = KeyedProducer(kafka)

kafka.ensure_topic_exists('test2')

while(1):
   print "sending msg "
   producer.send_messages(b'test2',b'key1','msg')
   time.sleep(2)

This is throwing me error

raise PartitionUnavailableError("%s not available" % str(key))                                                                                                                            
kafka.common.PartitionUnavailableError: TopicAndPartition(topic='test2', partition='key1') not available   
Knight71
  • 2,927
  • 5
  • 37
  • 63

2 Answers2

1

For #1 the simplest would be to have separate streams for each topic, if at any point you need to have them combined and they have same structure - you can union them

For #2 have you tried using KeyedProducer?

Snippet from the link above:

producer = KeyedProducer(kafka)
producer.send_messages(b'my-topic', b'key1', b'some message')
producer.send_messages(b'my-topic', b'key2', b'this methode')
Alex Larikov
  • 754
  • 4
  • 11
  • I have tried the keyproducer , But I got this error kafka.common.PartitionUnavailableError: TopicAndPartition(topic='test2', partition='test1') not available How to create partition ? – Knight71 Jan 11 '16 at 17:31
  • for #1, Then I need to allocate that many cores for the receiver right ? – Knight71 Jan 11 '16 at 17:32
  • You create at least one (or more partitions) when you create each topic. But by default for testing you should be fine with all defaults (during topic creation it will create 1 partition) and api should use hashPartitioner by default. Exception looks like something else is went wrong. Do you have a snippet of the code where you create KeyedProducer and send messages? – Alex Larikov Jan 11 '16 at 18:12
  • For number of cores - yes, for each additional receiver you need additional core. General recommendation is to have more cores than number of receivers – Alex Larikov Jan 11 '16 at 18:24
  • I tried your code for KeyedProducer and it works just fine on my local instance of kafka with python 2.7 and latest version of python-kafka. Are you sure that topic exists? do you actually see this topic in Kafka? If it doesn't exist and auto-creation of topics is disabled - it might produce similar error. – Alex Larikov Jan 13 '16 at 00:08
  • Also what version of python-kafka do you use? is it some specific version or just the latest from pypi? As `partition='key1'` makes me think that there is some different method signature available where 2nd parameter is a partitionId, while there is no such signature in current version available – Alex Larikov Jan 13 '16 at 00:10
  • Thanks . I upgraded my kafka package to 0.9.5 and was able to send the keyed messages. – Knight71 Jan 16 '16 at 15:14
0

for question no. 1 you can use this signature

def
createDirectStream[K, V, KD <: Decoder[K], VD <: Decoder[V], R]
(ssc: StreamingContext, kafkaParams: Map[String, String], fromOffsets: Map[TopicAndPartition, Long],
 messageHandler: (MessageAndMetadata[K, V]) ⇒ R): InputDStream[R]

this will give you access to the MessageAndMetadata class which holds the topic name plus some other metadata like partition number and message offset. for example

KafkaUtils.createDirectStream[String, String, StringDecoder, StringDecoder, Map[String, String]](
  ssc,
  Map("metadata.broker.list" -> "localhost:9092"),
  topics,
  (mm: MessageAndMetadata[String, String]) => Map(mm.topic -> mm.message))

then you can do pattern matching on the map key to do whatever you want

fady zohdy
  • 45
  • 1
  • 8