0

I have a JavaPairRDD in the following format:

JavaPairRDD< String, Tuple2< String, List< String>>> myData;

I want to save it as a Key-Value format (String, Tuple2< String, List< String>>).

myData.saveAsXXXFile("output-path");

So my next job could read in the data directly to my JavaPairRDD:

JavaPairRDD< String, Tuple2< String, List< String>>> newData = context.XXXFile("output-path");

I am using Java 7, Spark 1.2, Java API. I tried saveAsTextFile and saveAsObjectFile, neither works. And I don't see saveAsSequenceFile option in my eclipse.

Does anyone have any suggestion for this problem? Thank you very much!

Justin Pihony
  • 66,056
  • 18
  • 147
  • 180
Edamame
  • 23,718
  • 73
  • 186
  • 320

1 Answers1

3

You could use SequenceFileRDDFunctions that is used through implicits in scala, however that might be nastier than using the usual suggestion for java of:

myData.saveAsHadoopFile(fileName, Text.class, CustomWritable.class,
                        SequenceFileOutputFormat.class);

implementing CustomWritable via extending

org.apache.hadoop.io.Writable

Something like this should work (did not check for compilation):

public class MyWritable extends Writable{
  private String _1;
  private String[] _2;

  public MyWritable(Tuple2<String, String[]> data){
    _1 = data._1;
    _2 = data._2;
  }

  public Tuple2<String, String[]> get(){
    return new Tuple2(_1, _2);
  }

  @Override
  public void readFields(DataInput in) throws IOException {
    _1 = WritableUtils.readString(in);
    ArrayWritable _2Writable = new ArrayWritable();
    _2Writable.readFields(in);
    _2 = _2Writable.toStrings();
  }

  @Override
  public void write(DataOutput out) throws IOException {
    Text.writeString(out, _1);
    ArrayWritable _2Writable = new ArrayWritable(_2);
    _2Writable.write(out);
  }
}

such that it fits your data model.

Justin Pihony
  • 66,056
  • 18
  • 147
  • 180
  • In this case, my CustomWritable is Tuple2< String, List< String>> . I don't think I could make Tuple2< String, List< String>> Writable? Can I? – Edamame Apr 06 '15 at 20:55
  • You could use implicits in scala to make it look like it is inherently possible. However, given Java, it will be best to just create a MyTuple2Writable and just map to it. – Justin Pihony Apr 07 '15 at 02:18
  • Thanks Justin. Could you point me to any document or example that could make Tuple2< String, List< String>> Writable ? Thanks! – Edamame Apr 07 '15 at 17:31