0

I have setup Kafka and spark streaming using maven in my system. I would like to know any suggestions that could help me do wider operations apart from typing something in the producer and seeing it in the consumers .

How can I create a source that puts data like json or avro into Kafka producer continuously and so I can process it with spark and perform some operations out of it . Need suggestions how can I design this

Ruslan Ostafiichuk
  • 4,422
  • 6
  • 30
  • 35
  • give us more details on the source of your data – Vale Jul 08 '16 at 10:35
  • I'm considering to give source as avro or protobuf –  Jul 08 '16 at 10:39
  • And I'm just doing it as an exercise.i have to create a source myself –  Jul 08 '16 at 10:40
  • I deleted my answer as it is uncorrelated, then. Have you already got your hands on the directory watch? That could be a way – Vale Jul 08 '16 at 10:57
  • Directory watch ? I don't get you buddy. Sorry –  Jul 08 '16 at 11:01
  • have a look at this fileStream: http://spark.apache.org/docs/latest/streaming-programming-guide.html#basic-sources – Vale Jul 08 '16 at 11:56
  • Is there a way I can make the data to be feed to Kafka producer continuously –  Jul 08 '16 at 11:59

1 Answers1

0

Please find the link below.

https://github.com/hortonworks-gallery/tutorials/blob/master/2015-09-26-transporting-real-time-event-stream-with-apache-kafka.md

This is an HDP tutorial.. If you are not using HDP stack.. please ignore the initial part of the tutorial..

It is having a Kafka producer through a jar file you can generate

In this tutorial uses a Java API to produce Truck events using New York City Truck Routes (kml) file.

You need to download the data file and Java code and jar file ..

Details are in the tutorial

Hope this helps

Tinto James
  • 121
  • 1
  • 8