I'm working on a project to learn cascading, and I'm stumped on this problem. Cascading doesn't seem to have anything to read the first line of each individual file in a directory, which I need to do in order to discover the content type from text analysis. I've looked through the cascading API documentation for several hours, and nothing has jumped out to me as useful, and Google doesn't produce a similar question on another forum.
I'd like to avoid running the jar for each file individually. So, instead of:
hadoop jar myApp.jar this.package.myApp inputPath/file.txt outputPath/
I'd like this:
hadoop jar myApp.jar this.package.myApp inputPath/ outputPath/