0

I have a kafka stream coming in with some input topic. This is the code i wrote for accepting kafka stream.

conf = SparkConf().setAppName(appname) 
sc = SparkContext(conf=conf) 
ssc = StreamingContext(sc) 
kvs = KafkaUtils.createDirectStream(ssc, topics,\ 
            {"metadata.broker.list": brokers})

Then I create two DStreams of the keys and values of the original stream.

keys = kvs.map(lambda x: x[0].split(" ")) 
values = kvs.map(lambda x: x[1].split(" "))

Then I perform some computation in the values DStream. For Example,

val = values.flatMap(lambda x: x*2)

Now, I need to combine the keys and the val DStream and return the result in the form of Kafka stream.

How to combine val to the corressponding key?

vidhan
  • 129
  • 2
  • 10

1 Answers1

0

You can just use the join operator on the 2 DStreams to merge them. When you do map, you are essentially creating another stream. So, join will help you merge them together.

Eg:

Joined_Stream = keys.join(values).(any operation like map, flatmap...)
Manav Garg
  • 512
  • 1
  • 3
  • 17
  • I did not get this part `(any operation like map, flatmap...)`, Can you elaborate more. – vidhan Aug 19 '16 at 01:06
  • I don't understand the thing which you want to actually do (I provided a generic answer of merging 2 DStreams). The thing is if you do a flatmap on the values, there is no way to map them back to keys as the output of that would be one flattened list.... By merging 2 Dstreams you can create RDDs each with elements of both keys & values, just that there would not be a one to one mapping... – Manav Garg Aug 19 '16 at 05:41