1

I have a cluster with yarn resources around 15 TB. I am trying to submit a query through Hive.My default container size on yarn is 4GB. No of mappers assign for that query is around 1000. I have been assigned a total of 10 % resources in my yarn queue. So only 430 Containers will be allocated at a single point in time. Each mapper is assigned a total of 1 container. Block size on HDFS is 128 MB. How can i optmize the query.

user3148326
  • 121
  • 1
  • 7

1 Answers1

0

You've mentioned memory settings, which sound fine, so your next steps to optimize the query (since you didn't give it) are

  • Additionally tune the Tez containers
  • Make your HDFS input files be approximately the size of an HDFS block.
  • Use a different queue, if yours is full. (SET tez.queue.name)
  • Partition your Hive tables on columns that make the most sense based on your WHERE clauses.
  • Ensure the data is stored as ORC w/ ZLib compression.
  • Use LLAP, if possible
OneCricketeer
  • 179,855
  • 19
  • 132
  • 245