In Spark Java API RDDs of key-value pairs are represented by the JavaPairRDD
Questions tagged [java-pair-rdd]
30 questions
5
votes
1 answer
How to convert Dataset into JavaPairRDD?
There are methods to convert Dataset to JavaRDD .
Dataset dataFrame;
JavaRDD data = dataFrame.toJavaRDD();
Is there any other ways to convert Dataset into javaPairRDD?

Manikandan Balasubramanian
- 1,079
- 4
- 14
- 27
4
votes
0 answers
How to collect Spark JavaPairRDD data as list
I am working on an Apache Spark 2.2.0 task in java and I currently perform a mapToPair() function over my JavaRDD and I get a result of JavaPairRDD. Consider Table as any Object type.
What I am trying to do now, is to collect…

Omen
- 313
- 3
- 13
2
votes
3 answers
Transform Java-Pair-Rdd to Rdd
I need to transform my Java-pair-rdd to a csv :
so i m thinking to transform it to rdd, to solve my problem.
what i want is to have my rdd transformed
from :
Key Value
Jack [a,b,c]
to :
Key value
Jack a
Jack b
Jack c
i see that it is…

A.HADDAD
- 1,809
- 4
- 26
- 51
2
votes
1 answer
Spark convert PairRDD to RDD
What is the best way to convert a PairRDD into an RDD with both K and V are merged (in java)?
For example, the PairRDD contains K as some string and V as a JSON. I want to add this K to the value JSON and produce an RDD.
Input PairRDD
("abc",…

Manikandan Kannan
- 8,684
- 15
- 44
- 65
2
votes
0 answers
One field in Protocol Buffers is always missing when reading from SequenceFile
Something mysterious is happening for me:
What I wanted to do:
1. Save a Protocol Buffers object as SequenceFile format.
2. Read this SequenceFile text and extract the field that I need.
The mystery part is:
One field that I wanted to retrieve is…

Fisher Coder
- 3,278
- 12
- 49
- 84
1
vote
2 answers
How to get the Tuples in Java, if one of the Value is empty? IndexOutOfBound
public class App {
public static void main(String[] args) {
List> SubPartandMaster = new ArrayList>();
List wtpmList = new ArrayList();
wtpmList.add("1");
…

Manav Mehta
- 23
- 7
1
vote
0 answers
How to take a range of elements from JavaPairRDD
I am trying to get data from HBase using Spark.
JavaPairRDD javaPairRdd =
sc.newAPIHadoopRDD(hbaseConf,
TableInputFormat.class,ImmutableBytesWritable.class, Result.class);
But I need to get elements from a range.…

Pushpitha Dilhan
- 11
- 3
1
vote
1 answer
JavaPairRDD to Dataset in SPARK
I have data in JavaPairRDD in format
JavaPairdRDD>>
I tried using below code
Encoder>> encoder2 =
Encoders.tuple(Encoders.STRING(),…

Jack
- 197
- 1
- 21
1
vote
1 answer
how to use filter using containsAll and contains in javapairrdd
I have 2 collections one is 'list' and another 'pairRdd2' which contains data as mentioned below.
I am trying to apply filter with containsAll where in if mypairRdd2 contains all the values mentioned in list. Expected result is…

Jack
- 197
- 1
- 21
1
vote
1 answer
Spark grouping and then sorting (Java code)
I have a JavaPairRDD and need to group by the key and then sort it using a value inside the object MyObject.
Lets say MyObject is:
class MyObject {
Integer order;
String name;
}
Sample data:
1, {order:1, name:'Joseph'}
1, {order:2,…

Magno C
- 1,922
- 4
- 28
- 53
1
vote
2 answers
Convert JavaPairRDD to Dataframe in Spark Java API
I am using Spark 1.6 with Java 7
I have a pair RDD:
JavaPairRDD filesRDD = sc.wholeTextFiles(args[0]);
I want to convert it into DataFrame with schema.
It seems that first I have to convert pairRDD to RowRDD.
So how to create RowRdd…

Mitul Modi
- 13
- 2
- 4
0
votes
1 answer
How to count instances of a key in a JavaPairRDD Java Spark
To elaborate on what I'm stuck on or unsure of how to approach, I currently have a JavaPairRDD "media" that contains two integer values, an followed id and a follower id. What I'm trying to do is count the number of times the key integer (followed…
0
votes
1 answer
CustomPartiton a JavaPairRDD
I have created a JavaPairRDD from two different datasets- first one is the output file from METIS graph partitioning algorithm, and second is the input graph for the METIS graph partitioner. The key value pair of the JavaPairRDD is constructed…

Aavash Bhandari
- 117
- 8
0
votes
1 answer
java.lang.OutOfMemoryError: Java heap space AND org.apache.spark.shuffle.MetadataFetchFailedException: Missing an output location for shuffle 4
I try to execute the code and i get the the followind errors:
java.lang.OutOfMemoryError: Java heap space
org.apache.spark.shuffle.MetadataFetchFailedException: Missing an output location for shuffle 4
The code can execute on small files(some kb),…
0
votes
0 answers
error of accessing first element in JavaPairRDD from Java SparkContext in IntelliJ
I am trying to run an example Java/Spark code from IntelliJ Idea on MacBook pro.
My java:
12.0.2
Also, I have run:
mvn dependency:copy-dependencies
in the folder with pom.xml.
It works well.
My code:
SparkConf conf = new SparkConf()
…

user3448011
- 1,469
- 1
- 17
- 39