I'm trying to run a MapReduce job over a large set of preexisting binary files. The files are already there and I can't change their format.
Should I write my own InputFormat for this? How can I make a simple InputFormat that simply returns an InputStream so that I can process the file?