PIG doesn't read my custom InputFormat

Question

I have a custom MyInputFormat that suppose to deal with record boundary problem for multi-lined inputs. But when I put the MyInputFormat into my UDF load function. As follow:

import org.apache.hadoop.mapreduce.InputFormat;
public class EccUDFLogLoader extends LoadFunc {
    @Override
    public InputFormat getInputFormat() {
        System.out.println("I am in getInputFormat function");
        return new MyInputFormat();
    }
}

import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
public class MyInputFormat extends TextInputFormat {
    public RecordReader createRecordReader(InputSplit inputSplit, JobConf jobConf) throws IOException {
        System.out.prinln("I am in createRecordReader");
        //MyRecordReader suppose to handle record boundary
        return new MyRecordReader((FileSplit)inputSplit, jobConf);
    }
}

For each mapper, it print out I am in getInputFormat function but not I am in createRecordReader. I am wondering if anyone can provide a hint on how to hoop up my costome MyInputFormat to PIG's UDF loader? Much Thanks.

I am using PIG on Amazon EMR.

try adding an `@Override` annotation on the `createRecordReader` method to ensure you have the correct signature — Chris White, Dec 19 '12 at 00:02

Chris White · Accepted Answer · 2012-12-19T02:09:34.660

1

Your signature doesn't match that of the parent class (you're missing the Reporter argument), try this:

@Override
public RecordReader<LongWritable, Text> getRecordReader(
        InputSplit inputSplit, JobConf jobConf, Reporter reporter)
             throws IOException {
  System.out.prinln("I am in createRecordReader");
  //MyRecordReader suppose to handle record boundary
  return new MyRecordReader((FileSplit)inputSplit, jobConf);
}

EDIT Sorry i didn't spot this earlier, as you note, you need to use the new API signature instead:

@Override
public RecordReader<LongWritable, Text> 
      createRecordReader(InputSplit split,
             TaskAttemptContext context) {
  System.out.prinln("I am in createRecordReader");
  //MyRecordReader suppose to handle record boundary
  return new MyRecordReader((FileSplit)inputSplit, jobConf);
}

And your MyRecordReader class needs to extend the org.apache.hadoop.mapreduce.RecordReader class

edited Dec 19 '12 at 02:09

answered Dec 19 '12 at 01:33

Chris White

29,949
4
71
93

if I put @Override, then it gives me error says `MyInputFormat.java:11: method does not override or implement a method from a supertype @Override.` – Simon Guo Dec 19 '12 at 01:48
The error is because your current methof signature doesn't override a parent method. Add in the Reporter argument and you should be ok – Chris White Dec 19 '12 at 01:50
Pig `getInputFormat` expect a `org.apache.hadoop.mapreduce.InputFormat` so the `TextInputFormat` in `MyInputFormat` is from `org.apache.hadoop.mapreduce.lib.input.TextInputFormat`. And it doesn't have `getRecordReader` but `createRecordReader`. That's why I use `createRecordReader`. And it still gives me error. – Simon Guo Dec 19 '12 at 01:57
Or should I not extends `TextInputFormat`? If so which one should I extend to?? – Simon Guo Dec 19 '12 at 02:05
You should extends org.apache.hadoop.mapreduce.lib.input.TextInputFormat – zjffdu Dec 19 '12 at 04:36
Thanks, I am be able to reach my `RecordReader` now. Thanks `Chris` and `zjffdu` – Simon Guo Dec 19 '12 at 06:13

PIG doesn't read my custom InputFormat

1 Answers1