How to create Spark dataset from pairRDD using java. Could you please help?
Asked
Active
Viewed 204 times
-5
-
I bookmarked the link, thanks ;-) @Kiran, what have you tried so far? Have you written a bit of code? If so, what's wrong with it? Ideally, can you provide a minimal snippet of code that allows to reproduce your problem? – Oli Dec 05 '19 at 09:28
-
Hey Kiran, have you found an answer to your question? – Oli Dec 13 '19 at 09:00
1 Answers
0
Basically, to go from a dataset to a pairRDD in Java, you first need to convert the dataset to a RDD using javaRDD()
and then to a pairRDD
using mapToPair
.
Here is an example:
//creating a dataset (of rows)
Dataset<Row> ds = spark
.range(5)
.select(col("id").alias("x"),
col("id").multiply(col("id")).alias("y"));
JavaPairRDD<Long, Long> pairRDD = ds
.javaRDD() // to RDD in Java
.mapToPair(row -> new Tuple2<>(row.getLong(0), row.getLong(1)));

Oli
- 9,766
- 5
- 25
- 46