I have a directory full of MapFile
s. I now want to run a MR Job on them. I use the SequenceFileInputFormat
of the new API which should be aware of MapFile
s as one answer in this thread states. But however, this does not work. The job runs up to a certain percentage and after that, I get
Error: java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be cast to com.mycompany.MyOwnWritable
I suppose the mapper trips over the index file. How can I make sure these are ignored, or better, only files which have the correct input key and value classes are used? The only way that comes to mind is overriding Mapper<Object, Object, MyKeyOut, MyValueOut>
and using if
s and instanceof
checks, but I consider this ugly. Is there a better way to do this?