Questions tagged [java-pair-rdd]

In Spark Java API RDDs of key-value pairs are represented by the JavaPairRDD

30 questions
0
votes
1 answer

transform JavapairRDD to dataframe using scala

I have a javapairRDD in below format org.apache.spark.api.java.JavaPairRDD[com.vividsolutions.jts.geom.Geometry,com.vividsolutions.jts.geom.Geometry] Key is a polygon and value is a point in the polygon eg: [(polygon(1,2,3,4), POINT…
0
votes
0 answers

How to combine two JavapairRDD to a custom JavapairRDD?

I have created the following JavaPairRdds from the data received from different API endpoints. listHeaderRDD -> {list_id, list_details} e.g {1,{list_id:1,name:"abc",quantity:"2"}}, …
Dookoto_Sea
  • 521
  • 1
  • 5
  • 16
0
votes
1 answer

Iterating over an RDD Iterable in Scala

So I am new to Scala and just starting to work with RDDs and functional Scala operations. I am trying to iterate over the values of my Pair RDDs and return Var1 with the average of the values stored in Var2 by applying the defined averagefunction…
EliSquared
  • 1,409
  • 5
  • 20
  • 44
0
votes
0 answers

Apache-spark Error: Task failed while writing rows into sequenceFile

I am creating a javaPairRDD and saving it to sequenceFileFormat with apache-spark. Spark version is 2.3. I am running this on normal 4 node cluster and path is also normal hdfs path. I am doing it using spark code (Java): JavaSparkContext sc = new…
DAVID_ROA
  • 309
  • 1
  • 3
  • 18
0
votes
1 answer

What is the right JavaRDD transformation to cluster rows on disjoint sets

I have my rows setup in the JavaPairRDD where MyPojo is a pojo with an attribute (let's call it HashSet values). Now I want to cluster (merge) my rows based on any intersection with MyPojo.values. For example:
christo16
  • 4,843
  • 5
  • 42
  • 53
0
votes
0 answers

Can JavaPairRDD ever take an Array instead of a Tuple2 in Spark Java?

I reading the "Learning Spark" book and for example 5-14, I noticed that a JavaPairRDD was declared. I'm pretty sure that JavaPairRDD's can only take Tuple2s (i.e. for Key and Value) but I wasn't sure if there was some weird implicit…
howard
  • 432
  • 3
  • 9
0
votes
1 answer

Java Spark how to save a JavaPairRDD, HashMap> to file?

I got this "JavaPairRDD, HashMap>" RDD after some complicated aggregations, want to save the result to file. I believe saveAsHadoopFile is a good API to do so, but am having trouble filling in the parameters for…
daydayup
  • 2,049
  • 5
  • 22
  • 47
0
votes
1 answer

How to intersec differents JavaPairRDD

I have two different JavaPairRdd one with Key1,value and the second one with key2,value . What I try to achieve is merge them but get only the items with the same value. I have tried the following: JavaPairRDD finalRdd =…
Aikas91
  • 15
  • 5
0
votes
1 answer

Convert JavaPairRDD to JavaRDD

I am trying to read the data from HBase using Apache Spark. I want to only scan one specific column. I am creating an RDD of my HBase data like below SparkConf sparkConf = new…
InfamousCoder
  • 61
  • 1
  • 1
  • 11
0
votes
1 answer

How to generate JavaPairInputDStream from JavaStreamingContext?

I am learning Apache Spark streaming and tried to generate JavaPairInputDStream from JavaStreamingContext. Below is my code: import java.util.ArrayList; import java.util.Arrays; import java.util.LinkedList; import java.util.List; import…
Joseph Hwang
  • 1,337
  • 3
  • 38
  • 67
0
votes
1 answer

What is the alternative for combineByKey while using Tuple3 in Apache Spark in Java?

I am just starting out with Apache Spark in Java. I am currently doing a mini project with some books data. I have to find the most popular author in each country. I have a pairRDD where the Key is the country and Value is the Author, like…
kaushik3993
  • 105
  • 1
  • 3
  • 10
0
votes
0 answers

JavaPairRDD - mapToPair() throws outofmemoryerror

I am trying to iterate JavaPairRDD and apply some transformation on Value(which is Java Model class, Key is String) and returning the same Key Value Pair as JavaPairRDD. Before throwing outofMemoryError it says Marking Stage 5 (saveAsTextFile at…
Shankar
  • 8,529
  • 26
  • 90
  • 159
-3
votes
1 answer

how to apply flatMapToPair on a given rdd?

I have a JavaPairRDD>> named rddA. For example (after collecting rddA): [(word1,[(187,267), (224,311), (187,110)]), (word2,[(187,200), (10,90)])]. Thus, for example, word1 is the key and value is [(187,267),…
bib
  • 944
  • 3
  • 15
  • 32
-3
votes
1 answer

Write JavaPairRdd to Csv

JavaPairRdd has saveAsTextfile function, with which you can save data in a text format. However what I need is to save the data as CSV file, so I can use it later with Neo4j. My question is: How to save the JavaPairRdd 's data in CSV format? Or is…
A.HADDAD
  • 1,809
  • 4
  • 26
  • 51
-5
votes
1 answer

Converting pairRDD to dataset in spark using java

How to create Spark dataset from pairRDD using java. Could you please help?
Kiran
  • 43
  • 1
  • 1
  • 7
1
2