-5

How to create Spark dataset from pairRDD using java. Could you please help?

Kiran
  • 43
  • 1
  • 1
  • 7
  • I bookmarked the link, thanks ;-) @Kiran, what have you tried so far? Have you written a bit of code? If so, what's wrong with it? Ideally, can you provide a minimal snippet of code that allows to reproduce your problem? – Oli Dec 05 '19 at 09:28
  • Hey Kiran, have you found an answer to your question? – Oli Dec 13 '19 at 09:00

1 Answers1

0

Basically, to go from a dataset to a pairRDD in Java, you first need to convert the dataset to a RDD using javaRDD() and then to a pairRDD using mapToPair.

Here is an example:

//creating a dataset (of rows)
Dataset<Row> ds = spark
    .range(5)
    .select(col("id").alias("x"),
            col("id").multiply(col("id")).alias("y"));
JavaPairRDD<Long, Long> pairRDD = ds
    .javaRDD() // to RDD in Java
    .mapToPair(row -> new Tuple2<>(row.getLong(0), row.getLong(1)));
Oli
  • 9,766
  • 5
  • 25
  • 46