2

I have a map program which process ORC file. From the driver I set the orcformat as input format.

job.setInputFormatClass(OrcNewInputFormat.class); 

In the OrcNewInputFormat the value is OrcStruct. In Map method Writable value passed as parameter (value param) and it is type casted to OrcStruct inside the map like below.

OrcStruct record = (OrcStruct) value

I want to test this mapper using MRUnit. For this in the setup method of unit test I create a ORC file in testFilePath

 OrcFile.createWriter(testFilePath,  OrcFile.writerOptions(conf).inspector(inspector).stripeSize(100000).bufferSize(10000).version(OrcFile.Version.V_0_12));
writer.addRow(new SimpleStruct("k1", "v1")) ;

public static class SimpleStruct {
    Text k;
    Text string1;

    SimpleStruct(String b1, String s1) {
        this.k = new Text(b1);
        if (s1 == null) {
            this.string1 = null;
        } else {
            this.string1 = new Text(s1);
        }
    }
}

Then in the test method I read it and using MRUnit invoke mapper. Below is the code

// Read orc file
Reader reader = OrcFile.createReader(fs, testFilePath) ;  
RecordReader recordRdr = reader.rows() ;
OrcStruct row = null ;
List<OrcStruct> mapData = new ArrayList<>()

while(recordRdr.hasNext()) { 
    row = (OrcStruct) recordRdr.next(row) ;
    mapData.add(row) ; 
}

// test mapper
initializeSerde(mapDriver.getConfiguration());

Writable writable = getWritable(mapData.get(0))  ; // test 1st record's mapper processing
mapDriver.withCacheFile(strCachePath).withInput(NullWritable.get(), writable );
mapDriver.runTest(); 

But while running the test case I get below error

java.lang.UnsupportedOperationException: can't write the bundle
at org.apache.hadoop.hive.ql.io.orc.OrcSerde$OrcSerdeRow.write(OrcSerde.java:61)
at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:98)
at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:82)
at org.apache.hadoop.mrunit.internal.io.Serialization.copy(Serialization.java:80)
at org.apache.hadoop.mrunit.internal.io.Serialization.copy(Serialization.java:97)
at org.apache.hadoop.mrunit.internal.io.Serialization.copyWithConf(Serialization.java:110)
at org.apache.hadoop.mrunit.TestDriver.copy(TestDriver.java:675)
at org.apache.hadoop.mrunit.TestDriver.copyPair(TestDriver.java:679)
at org.apache.hadoop.mrunit.MapDriverBase.addInput(MapDriverBase.java:120)
at org.apache.hadoop.mrunit.MapDriverBase.withInput(MapDriverBase.java:210)

Looking at orcserde I can see that write is not supported which MRUnit invokes. Hence test case errors out.

How do we unit test case the mapper which is processing processing Orc file. Is there any other way or what needs to be changed in what I am doing?

Thanks in advance for the help .

skhurana
  • 91
  • 5

0 Answers0