1

I want to split the following RDD into a single RDD(id,(all name same type)).

>val test = rddByKey.map{case(k,v)=> (k,v.collect())}  

test: Array[(String, Array[String])] =   
  Array(
    (45000,Array(Amit, Pavan, Ratan)),
    (10000,Array(Kumar, Venkat, Sheela)), 
    (50000,Array(Tejas, Dinesh, Lokesh, Bhupesh))
  )

I want to print it like this:

(45000,(Amit, Pavan, Ratan))
(10000,(Kumar, Venkat, Sheela))

This is what I have tried

val data = sc.textFile("/user/cloudera/data.csv") 
val rdd = data.map(r=>(r.split(",")(0),r.split(",")(1))) 
val groupByKey = rdd.groupByKey().collect() 
val rddByKey = groupByKey.map{case(k,v) => k->sc.makeRDD(v.toSeq)} 
val test = rddByKey.map{case(k,v)=> (k,v.collect())}
Ramesh Maharjan
  • 41,071
  • 6
  • 69
  • 97
Biswajit
  • 13
  • 3
  • I'm not sure what exactly you are trying to achieve here. You want have `rddByKey` and you want to split it how exactly? Can you add someclearer input/expected output examples? – Shaido Mar 13 '18 at 02:07
  • How can `rdd.map` possibly become an `array`? – Xavier Guihot Mar 13 '18 at 06:59
  • val data = sc.textFile("/user/cloudera/data.csv") val rdd = data.map(r=>(r.split(",")(0),r.split(",")(1))) val groupByKey = rdd.groupByKey().collect() val rddByKey = groupByKey.map{case(k,v) => k->sc.makeRDD(v.toSeq)} val test = rddByKey.map{case(k,v)=> (k,v.collect())} <------Wan to collect it as (k,(Values of Array for same key)) – Biswajit Mar 13 '18 at 08:44
  • why would you want to print it like that? – Ramesh Maharjan Mar 13 '18 at 09:56

1 Answers1

0

You don't have to go through such complexity of using collect. you can simply do

val data = sc.textFile("/user/cloudera/data.csv")
val rdd = data.map(r=>{
  val x = r.split(",")
  (x(0),x(1))
})
val groupByKey = rdd.groupByKey().map{case (x, y) => (x :: y.toList)}

groupByKey is

List(45000, Amit, Pavan, Ratan)
List(10000, Kumar, Venkat, Sheela)
List(50000, Tejas, Dinesh, Lokesh, Bhupesh)

I hope the answer is helpful

Ramesh Maharjan
  • 41,071
  • 6
  • 69
  • 97
  • Can you please say how do I fetch index(0) value of flatMap . Please find below are my code , Not able to get value of content1(0) . code: val file1 = sc.textFile("/user/cloudera/file1.txt") val content1 = file1.flatMap(line=>line.split(" ")).map(word => (word,1)) val file1Word = sc.makeRDD(Array(content1(0)._1+content1(0)._2)) – Biswajit Mar 13 '18 at 18:07
  • For now I will just answer your above comment query but from next time please ask a different question for new contexts. It seems that you are trying to do work count , if thats so then this code should work for you :: `val file1 = sc.textFile("/user/cloudera/file1.txt") val content1 = file1.flatMap(line=>line.split(" ")).map(word => (word,1)) val file1Word = content1.reduceByKey(_ + _)` If the above answer helped you then please consider accepting it. :) – Ramesh Maharjan Mar 14 '18 at 07:22
  • for example : - o/p val content1 :- (are,2) (is,1)(the,1) Now I want to get {(is,1)} .. please suggest how can I get it . – Biswajit Mar 14 '18 at 09:47
  • I did . please share you ans.. the value of flatMap :- content1 :- (are,2) (is,1)(the,1) Now I want to get {(is,1)} .. how it's possible . – Biswajit Mar 14 '18 at 23:22
  • You should use filter `val file1 = sc.textFile("/user/cloudera/file1.txt") val content1 = file1.flatMap(line=>line.split(" ")).map(word => (word,1)).reduceByKey(_ + _) content1.filter(_._1 == "is")` and you should get (is, 1). – Ramesh Maharjan Mar 15 '18 at 08:08