Hive on tez in EMR schedule tasks very slow

Question

I'm trying to use Hive on tez to query orc format data stored in S3. Tez AM scheduled tasks very slow, a lot of Map tasks remained in "PENDING" for a long time.

There were enough resources in the cluster (quite enough I would say. There were more than 6TB memory and more than 1 thousand vcores available and in this job each container costs only 2GB memory. And this is the only job running in the yarn cluster), but the am just doing slow in scheduling tasks.

Is there any way I can accelerate this procedure?

score 0 · Answer 1 · answered Dec 17 '18 at 15:19

I had the same problem.

I resolved to change the engine of Hive.

Try with this command:

set hive.execution.engine = mr;

In any cases MR is best that tez.

AWS recommend use of TEZ, but not always in all cases. You are can to use MapReduce.

https://docs.amazonaws.cn/en_us/emr/latest/ReleaseGuide/emr-hive-differences.html

Hive on tez in EMR schedule tasks very slow

1 Answers1