I have a cluster with yarn resources around 15 TB. I am trying to submit a query through Hive.My default container size on yarn is 4GB. No of mappers assign for that query is around 1000. I have been assigned a total of 10 % resources in my yarn queue. So only 430 Containers will be allocated at a single point in time. Each mapper is assigned a total of 1 container. Block size on HDFS is 128 MB. How can i optmize the query.
Asked
Active
Viewed 955 times
1 Answers
0
You've mentioned memory settings, which sound fine, so your next steps to optimize the query (since you didn't give it) are
- Additionally tune the Tez containers
- Make your HDFS input files be approximately the size of an HDFS block.
- Use a different queue, if yours is full. (
SET tez.queue.name
) - Partition your Hive tables on columns that make the most sense based on your
WHERE
clauses. - Ensure the data is stored as ORC w/ ZLib compression.
- Use LLAP, if possible

OneCricketeer
- 179,855
- 19
- 132
- 245