Questions tagged [apache-tez]

The Apache Tez project is aimed at building an application framework which allows for a complex directed-acyclic-graph of tasks for processing data.

The Apache Tez project is aimed at building an application framework which allows for a complex directed-acyclic-graph of tasks for processing data. It is currently built atop Apache Hadoop YARN

See Hive-on-Tez configuration properties.

192 questions
0
votes
1 answer

Hive Joins on String Performance

We are using hive 3.1.3 and we're running Queries on Tez engine (Hive on Tez) on Cloudera platform. We are still in the beginning phase and we have a situation where we want to join several tables together using string columns. for example table…
speroo
  • 31
  • 4
0
votes
0 answers

Tez session getting created in every time spark job runs

Running a spark(scala) job on HDP cluster. However every time the job executes(both client and cluster mode) a parallel Tez session is also created and application is submitted to YARN. As part of spark job the, couple of sql jobs which are being…
0
votes
0 answers

Pyspark insert overwrite with dynamic partition is very slow

I am reading a 60gb sized csv file using pyspark, doing few basic transformations and loading it into hive dynamic partition table. Hdfs block size is 128mb, so 400+ partitions are created in spark. Transformation is completing in few minutes. But…
Raja
  • 507
  • 1
  • 6
  • 24
0
votes
1 answer

How to run more than one query concurrently in hue?

If I start the second query, the first one will be killed automatically. So any config of hue/hive/tez/yarn could control this max concurrency? Error message: org.apache.hive.service.cli.HiveSQLException: Error while processing statement: FAILED:…
cnby
  • 389
  • 4
  • 15
0
votes
0 answers

Hive SQL using TEZ as execution engine not giving result on empty partition

Hive SQL's on empty partitions giving no result instead of 0 rows or actual value. For example - --Create external Table Create external table test_tbl ( name string) partitioned by ( company string, processdate string) stored as orc location…
dinesh028
  • 2,137
  • 5
  • 30
  • 47
0
votes
1 answer

Apache Tez tasks on hold at the Application Master

I have a tez problem, when running about 14 queries at the same time, some of them get delays of more than 5 minutes, but the cluster utilization is just 14%. This is the message that I am talking about. INFO SessionState:…
Marco
  • 1,172
  • 9
  • 24
0
votes
0 answers

A job failure occurs while proceeding with the reduce phase with Hive on Tez

env hive 3.1.2 tez 0.10.2 hadoop 3.2.1 trouble I am using tez with hive llap. Tez setup was carried out according to official documents. https://tez.apache.org/install.html LLAP works well, but Tez job continues to fail. When running in…
hoon
  • 43
  • 1
  • 9
0
votes
1 answer

java.lang.OutOfMemoryError: Java heap space error while executing Hive query

While running the Hive query from Hive Shell using TEZ execution engine, I am getting java.lang.OutOfMemoryError: Java heap space error in the logs, but the query is getting completed at the end. I wanted to understand why am I getting this error in…
gkarya42
  • 429
  • 6
  • 22
0
votes
0 answers

Sorting a string column that contains numbers in it using Hive

I have created a Hive table CREATE external TABLE test.partordtst (name string,age string,priority string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '~' LOCATION '/DataprovidersDEV/Tom/Hive/Test/'; The values in the file that is used for…
TomG
  • 281
  • 1
  • 2
  • 20
0
votes
1 answer

How hive manage the Non-Tez and Non-MapReduce based queries

Create table t1(id int) I was firing above query on Hive 2.3.6 (MapR Hadoop Distribution 6.3.0). Default hive engine was tez. So after firing the query I was not able to see any TEZ application is launched on the yarn resource manager web ui So…
Pash0002
  • 110
  • 2
  • 14
0
votes
1 answer

Hive query throw "code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask" exception when query has GROUP BY cluase

I have Hive + LLAP on HDP 3.1.4 Hive and Tez Config is: yarn.nodemanager.resource.memory-mb = 40960 yarn.scheduler.minimum-allocation-mb = 1024 yarn.scheduler.maximum-allocation-mb = 40960 hive.tez.container.size =…
Arash Amani
  • 26
  • 1
  • 4
0
votes
1 answer

Hive returning error while running insert query

I am trying to run an insert query and face following error using mapreduce Application application_1609169302439_0001 failed 2 times due to AM Container for appattempt_1609169302439_0001_000002 exited with exitCode: 1 Failing this…
0
votes
1 answer

Failure after tez shuffle handler setup : Map operator initialization failed

After Setup for the Tez Shuffle Handler following instructions on : https://tez.apache.org/shuffle-handler.html I'm getting below error in query execution: Vertex failed, vertexName=Map 2, vertexId=vertex_1608273679503_0002_2_01, diagnostics=[Task…
kuldeep mishra
  • 154
  • 2
  • 11
0
votes
2 answers

Hive tez query fails with java.io.IOException

Executing a long running Hive Tez query, it rarely fails with: java.io.IOException: File hdfs://XXX with newer attempt ID 1 is smaller than the file hdfs://YYY with older attempt ID 0 In our 20 node HDP 3.1.5 cluster (Hive 3.1.0 and Tez 0.9.1), it…
Javier
  • 74
  • 7
0
votes
1 answer

How to control number of container in Hive-On-Tez

I'm new to using Tez engine. I'm running hive queries on Tez engine, and the query seems to utilize all the available resource. I'd like to know if there is any way to control the number of running containers. For eg., how we control in spark using…