1

I am trying to increase the number of map task. The file format is ORC and using TEZ for processing.

I am having a 2.8 gb files. Approximately 128 MB files and number of files is 29 approx.

Every time I execute 28 map task gets executed. I am trying to increase the map task count.

Thanks in advance

1 Answers1

0

Check these settings (see comments below):

set hive.tez.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;

set tez.grouping.min-size=16777216; -- files with smaller size will be combined if possible
set tez.grouping.max-size=67108864; -- (default is 1 Gb), files with bigger size will be splitted and more mappers started

Also you can control the number of mappers using this setting:

set mapreduce.job.maps=128; --better use grouping splits configuration (above) instead of this one because it is more flexible
leftjoin
  • 36,950
  • 8
  • 57
  • 116