0

I have multiple input sources and I have used Sqoop's codegen tool to generate custom classes for each input source

public class SQOOP_REC1 extends SqoopRecord  implements DBWritable, Writable

public class SQOOP_REC2 extends SqoopRecord  implements DBWritable, Writable

On the Map side, based on the input source, I create objects of the above 2 classes accordingly.

I have the key as type "Text" and since I have 2 different types of values, I kept the value output type as "Writable".

On the reduce side, I accept the value type as Writable.

   public class SkeletonReduce extends Reducer<Text,Writable, Text, Text> {

public void reduce(Text key, Iterable<Writable> values, Context context) throws     IOException,InterruptedException {

   }
}

I also set

job.setMapOutputValueClass(Writable.class);

During execution, it does not enter the reduce function at all.

Could someone tell me if it possible to do this? If so, what am I doing wrong?

haden
  • 164
  • 1
  • 6

3 Answers3

0

You can't specify Writable as your output type; it has to be a concrete type. All records need to have the same (concrete) key and value types, in Mappers and Reducers. If you need different types you can create some kind of hybrid Writable that contains either an "A" or "B" inside. It's a little ugly but works and is done a lot in Mahout for example.

But I don't know why any of this would make the reducer not run; this is likely something quite separate and not answerable based on this info.

Sean Owen
  • 66,182
  • 23
  • 141
  • 173
0

Look into extending GenericWritable for your value type. You need to define the set of classes which are allowed (SQOOP_REC1 and SQOOP_REC2 in your case), and it's not as efficient because it creates new object instances in the readFields method (but you can override this if you have a small set of classes, just have instance variables of both types, and a flag which denotes which one is valid)

Chris White
  • 29,949
  • 4
  • 71
  • 93
0

Ok, I think I figured out how to do this. Based on a suggestion give by Doug Cutting himself

http://grokbase.com/t/hadoop/common-user/083gzhd6zd/multiple-output-value-classes

I wrapped the class using ObjectWritable

ObjectWritable obj = new ObjectWritable(SQOOP_REC2.class,sqoop_rec2);

And then on the Reduce side, I can get the name of the wrapped class and Cast it back to the original class.

if(val.getDeclaredClass().getName().equals("SQOOP_REC2")){
                SQOOP_REC2temp = (SQOOP_REC2) val.get();

And don't forget

        job.setMapOutputValueClass(ObjectWritable.class);
haden
  • 164
  • 1
  • 6