Please allow me to provide a scenario:
hadoop jar test.jar Test inputFileFolder outputFileFolder
where
test.jar
sorts info by key, time, and placeinputFileFolder
contains multiple .gz files, each .gz file is about 10GBoutputFileFolder
contains bunch of .gz files
My question is which is the best way to deal with those .gz file in the inputFileFolder? Thank you!