I am using MapReduce to process my data. I need the output to be stored under date partitions. My sort key is a date string. Now if I override getPartition in my custom partitioner class to return the following:
return (formattedDate.hashCode() & Integer.MAX_VALUE) % numReduceTasks;
Because as we are using hash and Mod, in some cases we return a same integer value
eg:
Let's say numReduceTasks=100
Now the date 2018-01-20 might have hash value as 101. so 101%100 = 1
Now take other date as 2018-02-20 and might have hash value as 201. so 201%100 = 1
and because of this we are ending up with multiple date files going to single date partition. which is not desired. Any pointers on how to handle this?