Kafka spark streaming integration

Question

I have setup Kafka and spark streaming using maven in my system. I would like to know any suggestions that could help me do wider operations apart from typing something in the producer and seeing it in the consumers .

How can I create a source that puts data like json or avro into Kafka producer continuously and so I can process it with spark and perform some operations out of it . Need suggestions how can I design this

And I'm just doing it as an exercise.i have to create a source myself — , Jul 08 '16 at 10:40
I deleted my answer as it is uncorrelated, then. Have you already got your hands on the directory watch? That could be a way — Vale, Jul 08 '16 at 10:57
have a look at this fileStream: http://spark.apache.org/docs/latest/streaming-programming-guide.html#basic-sources — Vale, Jul 08 '16 at 11:56
Is there a way I can make the data to be feed to Kafka producer continuously — , Jul 08 '16 at 11:59

score 0 · Answer 1 · answered Jul 09 '16 at 10:29

Please find the link below.

https://github.com/hortonworks-gallery/tutorials/blob/master/2015-09-26-transporting-real-time-event-stream-with-apache-kafka.md

This is an HDP tutorial.. If you are not using HDP stack.. please ignore the initial part of the tutorial..

It is having a Kafka producer through a jar file you can generate

In this tutorial uses a Java API to produce Truck events using New York City Truck Routes (kml) file.

You need to download the data file and Java code and jar file ..

Details are in the tutorial

Hope this helps

Kafka spark streaming integration

1 Answers1