I have List of Key,value pairs such as List((A,1),(B,2),(C,3)) in heap memory. How can I parallelize this list to create a JavaPairRDD? In scala : val pairs = sc.parallelize(List((A,1),(B,2),(C,3)). Likewise, Is there any way with java API?
Asked
Active
Viewed 4,905 times
0
-
Did it happen that you RFTM ? – eliasah Apr 29 '16 at 08:56
-
I referred the manual. I know how to do it with scala and python. is there any way doing it with java? – Sandeep Veerlapati Apr 29 '16 at 09:45
4 Answers
2
I found the answer. First store the List of tuples in JavaRDD and then convert it to JavaPairRDD.
List<Tuple2> data = Arrays.asList(new Tuple2("panda", 0),new Tuple2("panda", 1));
JavaRDD rdd = sc.parallelize(data);
JavaPairRDD pairRdd = JavaPairRDD.fromJavaRDD(rdd);
Have a look at this answer

Community
- 1
- 1

Sandeep Veerlapati
- 121
- 2
- 11
1
I can see this one working for me
sc.parallelizePairs(Arrays.asList(new Tuple2("123","123")));

Andy
- 49,085
- 60
- 166
- 233

pranaygoyal02
- 181
- 1
- 10
0
Parallelized collections are created by calling JavaSparkContext’s parallelize method on an existing Collection in your driver program. The elements of the collection are copied to form a distributed dataset that can be operated on in parallel.
List data = ......;
JavaRDD rdd = sc.parallelize(data);

banjara
- 3,800
- 3
- 38
- 61
-
By using the above lines you can only store the elements not the key value pairs. And also I am trying to create JavaPairRDD not JavaRDD – Sandeep Veerlapati Apr 29 '16 at 09:15
-
1Although this code may answer the question, providing additional context regarding _why_ and/or _how_ it answers the question would significantly improve its long-term value. Please [edit] your answer to add some explanation. – Toby Speight Apr 29 '16 at 14:54
-
@SandeepVeerlapati if type of your list is tuple, I think spark will create pairedRDD – banjara Apr 29 '16 at 17:16
-
@shekar what you are saying is correct in case of scala api. – Sandeep Veerlapati May 02 '16 at 03:59
0
Convert Tuple into List with below code snippet.
Tuple2<Sensor, Integer> tuple = new Tuple2<Sensor, Integer>(arg0._2, 1);
List<Tuple2<Sensor, Integer>> list = new ArrayList<Tuple2<Sensor, Integer>>();
list.add(tuple);

Rajeev Rathor
- 1,830
- 25
- 20