Let's say I have 2 rdds : the first rdd is composed of strings which are html requests :
rdd1 :
serverIP:80 clientIP1 - - [10/Jun/2016:10:47:37 +0200] "GET /path/to/page1 [...]"
serverIP:80 clientIP2 - - [11/Jun/2016:11:25:12 +0200] "GET /path/to/page2 [...]"
...
The second rdd is simply integers :
rdd2 :
0.025
0.56
...
I would like to concatenate the string lines by lines in order to obtain a third rdd like this : rdd3 :
serverIP:80 clientIP1 - - [10/Jun/2016:10:47:37 +0200] "GET /path/to/page1 [...]" 0.025
serverIP:80 clientIP2 - - [11/Jun/2016:11:25:12 +0200] "GET /path/to/page2 [...]" 0.56
...
By the way, this job is a streaming job. It's to say, I don't want to store permanently the data in some kind of sql table or something else.
Any idea on how to tackle this ?
Thanks in advance !
EDIT : For people trying to join Dstream and not rdd, have a look at this : How to Combine two Dstreams using Pyspark (similar to .zip on normal RDD)