Yarn container size and Tez container management

Question

I have a cluster with yarn resources around 15 TB. I am trying to submit a query through Hive.My default container size on yarn is 4GB. No of mappers assign for that query is around 1000. I have been assigned a total of 10 % resources in my yarn queue. So only 430 Containers will be allocated at a single point in time. Each mapper is assigned a total of 1 container. Block size on HDFS is 128 MB. How can i optmize the query.

score 0 · Answer 1 · answered Nov 05 '17 at 21:05

You've mentioned memory settings, which sound fine, so your next steps to optimize the query (since you didn't give it) are

Additionally tune the Tez containers
Make your HDFS input files be approximately the size of an HDFS block.
Use a different queue, if yours is full. (SET tez.queue.name)
Partition your Hive tables on columns that make the most sense based on your WHERE clauses.
Ensure the data is stored as ORC w/ ZLib compression.
Use LLAP, if possible

Yarn container size and Tez container management

1 Answers1