0

I have parse the data and generated following RDD:

x [RDD] = (458817,(CompactBuffer(20),CompactBuffer((837063182,0,1433142639864), (676690466,0,1433175090184), (4642913327036075112,1,1433177284025), (464291332,1,1433182403135), (4642913327036075112,0,1433185531150), 
(464291332,0,1433186067803), (4642913327036075112,1,1433186266561), (851805971,0,1433190829047), 
(6376558263039679112,1,1433203286945), (837063182,0,1433226615856), (8403476884799939112,0,1433287740066), 
(764990231,0,1433289484047), (4642913327036075112,0,1433351165901), (464291332,1,1433351892238), 
(4642913327036075112,0,1433374808826), (584492430,1,1433436093253))))

Here I am only showing a record which is in the RDD, My goal is to get the following RDD: Where I attached first element.

(458817,837063182,0,1433142639864) 
(458817,676690466,0,1433175090184) 
(458817,464291332,1,1433177284025) 
(458817,464291332,1,1433182403135) 
(458817,464291332,0,1433185531150) 
(458817,464291332,0,1433186067803) 
(458817,464291332,1,1433186266561) 
(458817,851805971,0,1433190829047) 
(458817,637655826,1,1433203286945) 
(458817,837063182,0,1433226615856)

By doing a flatMap I loose the first element and doesn't get access to it:

val r = x.map(l => l._2).flatMap(x => x._2).map(x => (x._1, x._2, x._3, x._4))
add-semi-colons
  • 18,094
  • 55
  • 145
  • 232

2 Answers2

0

This would probably give you the wanted result:

val r = for {
  el <- Seq(x._1)
  (el1, el2, el3) <- x._2._2
} yield (el, el1, el2, el3)

Lift the first element to a Sequence to use it in the for expression. Pull out the second CompactBuffer and yield the wanted tuples.

thwiegan
  • 2,163
  • 10
  • 18
  • for starts inside the map..? – add-semi-colons Aug 10 '15 at 17:10
  • problem is that this will not give me an rdd right..? it gives a lists after this I can't access each element ex: el1, el2.... at a later time – add-semi-colons Aug 10 '15 at 17:46
  • 1
    phew.. good question. But since this basically uses map and flatmap, it should generate the same collection type (CompactBuffer). Maybe you need to lift the first element to a CompactBuffer instead of Sequence. Not quite sure how to do this though. Don't have a test environment with spark. So can't really test it. – thwiegan Aug 10 '15 at 17:49
  • Yeap figured solution may be should post it as an answer :) – add-semi-colons Aug 10 '15 at 18:46
0

This gave me the exact structure I wanted.

val s = r.map(x => (x._2._2).map(y => (x._1, y._1, y._2.toInt, y._3, y._4))).flatMap(k => k)
add-semi-colons
  • 18,094
  • 55
  • 145
  • 232