1

**I'm trying to stream the data from kafka and convert it in to a data frame.followed this link

But when im running both producer and consumer applications, this is the output on my console.**

(0,[B@370ed56a) (1,[B@2edd3e63) (2,[B@3ba2944d) (3,[B@2eb669d1) (4,[B@49dd304c) (5,[B@4f6af565) (6,[B@7714e29e)

Which is literally the output of the kafka producer, the topic is empty before pushing messages.

Here is the producer code snippet :

Properties props = new Properties();
props.put("bootstrap.servers", "##########:9092");
props.put("key.serializer",
        "org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer",
        "org.apache.kafka.common.serialization.ByteArraySerializer");
props.put("producer.type", "async");
Schema.Parser parser = new Schema.Parser();
Schema schema = parser.parse(EVENT_SCHEMA);
Injection<GenericRecord, byte[]> records = GenericAvroCodecs.toBinary(schema);

KafkaProducer<String, byte[]> producer = new KafkaProducer<String, byte[]>(props);
for (int i = 0; i < 100; i++) {
    GenericData.Record avroRecord = new GenericData.Record(schema);
    setEventValues(i, avroRecord);
    byte[] messages = records.apply(avroRecord);
    ProducerRecord<String, byte[]> producerRecord = new ProducerRecord<String, byte[]>(
            "topic", String.valueOf(i),messages);
    System.out.println(producerRecord);
    producer.send(producerRecord);
}

And its output is:

key=0, value=[B@680387a key=1, value=[B@32bfb588 key=2, value=[B@2ac2e1b1 key=3, value=[B@606f4165 key=4, value=[B@282e7f59

Here is my consumer code snippet written in scala,

"group.id" -> "KafkaConsumer",
"zookeeper.connection.timeout.ms" -> "1000000"

val topicMaps = Map("topic" -> 1)
val messages = KafkaUtils.createStream[String, Array[Byte], StringDecoder, DefaultDecoder](ssc, kafkaConf, topicMaps, StorageLevel.MEMORY_ONLY_SER)
messages.print()

I've tried with both StringDecoder and DefaultDecoder in createStream().I'm sure that, the producer and consumer are in compliance with each other. Any help , from anybody?

Yuval Itzchakov
  • 146,575
  • 32
  • 257
  • 321
jack AKA karthik
  • 885
  • 3
  • 15
  • 30
  • I'm not sure what the problem is here. You're sending a byte array and receiving a byte array, which is what you're printing to the console. – Yuval Itzchakov Dec 20 '16 at 13:00
  • the output of messages RDD is same as the producer.createStream is not converting the byte[] – jack AKA karthik Dec 20 '16 at 13:03
  • I'm expecting this as the result {"action":"AppEvent","tenantid":1173,"lat":0.0,"lon":0.0,"memberid":55,"event_name":"CATEGORY_CLICK","productUpccd":0,"device_type":"iPhone","device_os_ver":"10.1","item_name":"CHICKEN"} from the messages RDD – jack AKA karthik Dec 20 '16 at 13:07
  • `createStream` doesn't touch the actual data, it's up to you to deserialize it however you need. – Yuval Itzchakov Dec 20 '16 at 13:07
  • So how can i deserialise it, https://community.hortonworks.com/articles/33275/receiving-avro-messages-through-kafka-in-a-spark-s.html#comment-71277 this link clearly says, we can create a data frame out of the byte[]. but here im getting an empty table – jack AKA karthik Dec 20 '16 at 13:12
  • +------+--------+---+---+--------+----------+------------+-----------+-------------+---------+ |action|tenantid|lat|lon|memberid|event_name|productUpccd|device_type|device_os_ver|item_name| +------+--------+---+---+--------+----------+------------+-----------+-------------+---------+ +------+--------+---+---+--------+----------+------------+-----------+-------------+---------+ – jack AKA karthik Dec 20 '16 at 13:12
  • I tried the same with [String,String], but getting error while create a data frame out of the avro schema. here is the link http://stackoverflow.com/questions/41237929/value-toint-is-not-a-member-of-object – jack AKA karthik Dec 20 '16 at 13:17
  • @jackAKAkarthik in that link they use `def parseAVROToString(rawTweet: Array[Byte]): String` to parse the `byte[]` – maasg Dec 20 '16 at 16:42
  • Yes @maasg i had it in my code too, initially i thought creatStream will do the necessary parsing, so i posted only that snippet. – jack AKA karthik Dec 20 '16 at 16:55
  • You need something to transform the `Array[Byte]` to some representation. The example uses some Twitter representation. You will need to adapt to your own usecase. – maasg Dec 20 '16 at 17:42
  • If you want the decoding to happen when the stream is consumed, you need to implement your own [Decoder](https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/serializer/Decoder.scala) – maasg Dec 20 '16 at 17:45
  • Ya this is true. But i could get the expected result from following the link. I wonder what can be the reason – jack AKA karthik Dec 20 '16 at 18:37

0 Answers0