Questions tagged [java-pair-rdd]

In Spark Java API RDDs of key-value pairs are represented by the JavaPairRDD

30 questions
5
votes
1 answer

How to convert Dataset into JavaPairRDD?

There are methods to convert Dataset to JavaRDD . Dataset dataFrame; JavaRDD data = dataFrame.toJavaRDD(); Is there any other ways to convert Dataset into javaPairRDD?
4
votes
0 answers

How to collect Spark JavaPairRDD data as list

I am working on an Apache Spark 2.2.0 task in java and I currently perform a mapToPair() function over my JavaRDD and I get a result of JavaPairRDD. Consider Table as any Object type. What I am trying to do now, is to collect…
Omen
  • 313
  • 3
  • 13
2
votes
3 answers

Transform Java-Pair-Rdd to Rdd

I need to transform my Java-pair-rdd to a csv : so i m thinking to transform it to rdd, to solve my problem. what i want is to have my rdd transformed from : Key Value Jack [a,b,c] to : Key value Jack a Jack b Jack c i see that it is…
A.HADDAD
  • 1,809
  • 4
  • 26
  • 51
2
votes
1 answer

Spark convert PairRDD to RDD

What is the best way to convert a PairRDD into an RDD with both K and V are merged (in java)? For example, the PairRDD contains K as some string and V as a JSON. I want to add this K to the value JSON and produce an RDD. Input PairRDD ("abc",…
Manikandan Kannan
  • 8,684
  • 15
  • 44
  • 65
2
votes
0 answers

One field in Protocol Buffers is always missing when reading from SequenceFile

Something mysterious is happening for me: What I wanted to do: 1. Save a Protocol Buffers object as SequenceFile format. 2. Read this SequenceFile text and extract the field that I need. The mystery part is: One field that I wanted to retrieve is…
1
vote
2 answers

How to get the Tuples in Java, if one of the Value is empty? IndexOutOfBound

public class App { public static void main(String[] args) { List> SubPartandMaster = new ArrayList>(); List wtpmList = new ArrayList(); wtpmList.add("1"); …
1
vote
0 answers

How to take a range of elements from JavaPairRDD

I am trying to get data from HBase using Spark. JavaPairRDD javaPairRdd = sc.newAPIHadoopRDD(hbaseConf, TableInputFormat.class,ImmutableBytesWritable.class, Result.class); But I need to get elements from a range.…
1
vote
1 answer

JavaPairRDD to Dataset in SPARK

I have data in JavaPairRDD in format JavaPairdRDD>> I tried using below code Encoder>> encoder2 = Encoders.tuple(Encoders.STRING(),…
Jack
  • 197
  • 1
  • 21
1
vote
1 answer

how to use filter using containsAll and contains in javapairrdd

I have 2 collections one is 'list' and another 'pairRdd2' which contains data as mentioned below. I am trying to apply filter with containsAll where in if mypairRdd2 contains all the values mentioned in list. Expected result is…
Jack
  • 197
  • 1
  • 21
1
vote
1 answer

Spark grouping and then sorting (Java code)

I have a JavaPairRDD and need to group by the key and then sort it using a value inside the object MyObject. Lets say MyObject is: class MyObject { Integer order; String name; } Sample data: 1, {order:1, name:'Joseph'} 1, {order:2,…
Magno C
  • 1,922
  • 4
  • 28
  • 53
1
vote
2 answers

Convert JavaPairRDD to Dataframe in Spark Java API

I am using Spark 1.6 with Java 7 I have a pair RDD: JavaPairRDD filesRDD = sc.wholeTextFiles(args[0]); I want to convert it into DataFrame with schema. It seems that first I have to convert pairRDD to RowRDD. So how to create RowRdd…
0
votes
1 answer

How to count instances of a key in a JavaPairRDD Java Spark

To elaborate on what I'm stuck on or unsure of how to approach, I currently have a JavaPairRDD "media" that contains two integer values, an followed id and a follower id. What I'm trying to do is count the number of times the key integer (followed…
0
votes
1 answer

CustomPartiton a JavaPairRDD

I have created a JavaPairRDD from two different datasets- first one is the output file from METIS graph partitioning algorithm, and second is the input graph for the METIS graph partitioner. The key value pair of the JavaPairRDD is constructed…
0
votes
1 answer

java.lang.OutOfMemoryError: Java heap space AND org.apache.spark.shuffle.MetadataFetchFailedException: Missing an output location for shuffle 4

I try to execute the code and i get the the followind errors: java.lang.OutOfMemoryError: Java heap space org.apache.spark.shuffle.MetadataFetchFailedException: Missing an output location for shuffle 4 The code can execute on small files(some kb),…
0
votes
0 answers

error of accessing first element in JavaPairRDD from Java SparkContext in IntelliJ

I am trying to run an example Java/Spark code from IntelliJ Idea on MacBook pro. My java: 12.0.2 Also, I have run: mvn dependency:copy-dependencies in the folder with pom.xml. It works well. My code: SparkConf conf = new SparkConf() …
user3448011
  • 1,469
  • 1
  • 17
  • 39
1
2