2

I have a RDD of vectors. Say the values of the vectors of the RDD is following

1 1 1
2 2 2
3 3 3

I want to convert it to following

1 2 3
1 2 3
1 2 3

Either of the two following vectors is fine with me.

org.apache.spark.util.Vector

org.apache.spark.mllib.linalg.Vector 

It can be done locally by converting the RDD into List/Array, for a big data which will become impossible. I found some codes online written for Scala. Any idea how can I do it for Java Spark? I am using Java-7, therefore no lambda expressions.

******I have added a comment regarding why the solutions of the possible duplicate question is not helping me out.

zero323
  • 322,348
  • 103
  • 959
  • 935
Rajiur Rahman
  • 307
  • 2
  • 11
  • possible duplicate of [How to transpose an RDD in Spark](http://stackoverflow.com/questions/29390717/how-to-transpose-an-rdd-in-spark) – Daniel Darabos Apr 30 '15 at 21:49
  • If you have found Scala solutions but have problems translating them into Java, it may be worth asking about the problem you're having with that. The Java and Scala APIs for Apache Spark are quite similar. What are you stuck with? – Daniel Darabos Apr 30 '15 at 21:52
  • @DanielDarabos, I checked your and the other solution from the possible duplicate question. Several things. First, I am not familiar with Scala. Second, I am not able to use lambda expressions as the rest of my code is already written in Java-7 (EC2 cluster is not supporting Java-8). I made a zipped RDD _JavaRDD.zipWithIndex()_ and right after that I am lost. – Rajiur Rahman May 01 '15 at 07:09
  • @DanielDarabos I am a newbie to Spark and I could use some expert advice here :) – Rajiur Rahman May 01 '15 at 07:11
  • I've never used these classes, but from the documentation it looks like you could use `zipWithIndex` to add the row indices and then `flatMap` to split each `Vector` into one `MatrixEntry` per element. – Daniel Darabos May 04 '15 at 17:59
  • @DanielDarabos, I have created the first RDD (byColumnAndRow) of [possible solution](http://stackoverflow.com/questions/29390717/how-to-transpose-an-rdd-in-spark) . I am fine with either Tuple3 or a MatrixEntry(i, j, value). Now, **scala** variables seem to be free of any data type. Could you please tell me a little more about the next two sptes, groupByKey() and sortByKey() in the context of **Java**. – Rajiur Rahman May 05 '15 at 23:45
  • 1
    I have never used Spark via Java, so I have no idea what difficulty you're even facing. The examples in http://spark.apache.org/docs/latest/programming-guide.html are all in three languages, so you should be able to use that as a "Rosetta Stone". If you get stuck with a particular problem, you should ask that as a new question here. And of course we can hope that someone familiar with the Java API will answer this question. (You can try a bounty if you like.) – Daniel Darabos May 06 '15 at 06:01

0 Answers0