I have a hive table to which data gets added every day. So, around 5 files get added each day. Now we ended up having 800 part files under this table.
The issue i have is joining or using this table anywhere is triggering 800 mappers, as mappers are proportional to the number of files.
But i do have to use the entire table for my jobs running.
Is there way to use the entire table but not triggering too many mappers?
Files look like below
-rw-rw-r-- 3 XXXX hdfs 106610 2015-12-15 05:39 /apps/hive/warehouse/prod.db/TABLE1/000000_0_copy_1.deflate
-rw-rw-r-- 3 XXXX hdfs 106602 2015-12-23 12:31 /apps/hive/warehouse/prod.db/TABLE1/000000_0_copy_10.deflate
-rw-rw-r-- 3 XXXX hdfs 157686 2016-03-06 05:20 /apps/hive/warehouse/prod.db/TABLE1/000000_0_copy_100.deflate
-rw-rw-r-- 3 XXXX hdfs 163580 2016-03-07 05:22 /apps/hive/warehouse/prod.db/TABLE1/000000_0_copy_101.deflate