I have a streaming map-reduce job. I have some 30 slots for processing. Initially I get a single input file containing 60 records (fields are tab separated), first field of every record is a number, for first record number(first field) is 1, for second record number(first field) is 2 and so on. I want to create 30 files from these records for next step of processing, each containing 2 records each (even distribution).
For this to work I specified number of reducers to hadoop job as 30. I expected that first field will be used as key and I will get 30 output files each containing 2 records.
I do get 30 output files but not all containing same number of records. Some files are even empty (zero size). Any idea