Questions tagged [tez]

The Apache Tez™ project is aimed at building an application framework which allows for a complex directed-acyclic-graph of tasks for processing data. It is currently built atop Apache Hadoop YARN.

The 2 main design themes for Tez are:

Empowering end users by: Expressive dataflow definition APIs Flexible Input-Processor-Output runtime model Data type agnostic Simplifying deployment Execution Performance Performance gains over Map Reduce Optimal resource management Plan reconfiguration at runtime Dynamic physical data flow decisions

[For more details visit this link][1]: https://tez.apache.org/

94 questions
2
votes
2 answers

Why is hive join taking too long?

I am running a code which basically goes like this: Create table abc as select A.* from table1 A Left outer join table2 B on A.col1=B.col1 and A.col2=B.col2; Number of records in table1=7009102 Number of records in table2=1787493 I have similar…
Djeah
  • 320
  • 8
  • 21
2
votes
0 answers

How to fix intermittent file not found error in Hive using Tez engine

I get an intermittent FileNotFoundException error when I run a query in Hive using the Tez engine. ERROR : Status: Failed ERROR : Vertex failed, vertexName=Map 1, vertexId=vertex_1508808910527_45616_1_00, diagnostics=[Task failed,…
s d
  • 2,666
  • 4
  • 26
  • 42
2
votes
0 answers

hadoop - How to kill a TEZ job started by hive?

Below is what I can find. But the problem is if we reuse jdbc hive session all the hive queries go as same Application-Id. Is there a way I can kill a dag? Tez jobs can be listed using: yarn application -list Tez jobs can be killed using: yarn…
Himanshu Virmani
  • 2,450
  • 1
  • 24
  • 34
2
votes
2 answers

hive query BlockMissingException

I am having issues on both TEZ and MapReduce execution engines. Both appear related to permissions but for the life of me, I am lost. When I execute it through TEZ I get this message: org.apache.hadoop.hdfs.BlockMissingException: Could not obtain…
Eva Donaldson
  • 385
  • 4
  • 18
2
votes
0 answers

CSV loading from hadoop to hive - scalability issue 20GB/h

I'm looking for a way to speedup loading data from hadoop to hive from where I query with presto. In my workflow I start with single csv file on hadoop. CREATE EXTERNAL TABLE src_x (KEY INT, X2 INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY…
jangorecki
  • 16,384
  • 4
  • 79
  • 160
2
votes
1 answer

Hive query running infinite time on Tez

Below is the query which I am trying to run on Hive with execution engine as tez. SELECT A.CITY, A.NAME, B.PRICE ,(ROW_NUMBER() OVER (PARTITION BY A.NAME ORDER BY B.PRICE) ) AS RNUM FROM TABLE1 A LEFT JOIN TABLE2 B ON A.NAME = B.NAME …
hadooper
  • 726
  • 1
  • 6
  • 18
2
votes
1 answer

How to get spark-submit hive context running properly with tez and yarn?

I've been trying to get spark-submit to work with org.apache.spark.sql.hive.HiveContext, but I keep coming out with java.lang.NoClassDefFoundError: org/apache/tez/dag/api/SessionNotRunning. Here is the code, which breaks on the last line: val sc =…
1
vote
1 answer

Way to get current editor id (doc id) in HiveServerClient in Hue source code

When having multiple hue pages run tez applications at the same time, it, sometimes, will apply the same session to two different tasks, which will cause of of them receiving KILL signal and the other one complains that current app master is being…
KAs
  • 1,818
  • 4
  • 19
  • 37
1
vote
0 answers

Failed Vertex re-running

I am trying to run hive query on EMR in was using tez. I am facing below error. I am running this query on 20 node cluster. I am running below query. insert overwrite table temp_table partition(hour) select a,b,c,hour from…
Ravi
  • 21
  • 1
  • 2
1
vote
0 answers

Poor performance on hash joins with Pig on Tez

I have a series of Pig scripts that are transforming hundreds of millions of records from multiple data sources that need to be joined together. Towards the end of each script, I reach a point where JOIN performance becomes terribly slow. Looking at…
Jeff Rosenberg
  • 3,522
  • 1
  • 18
  • 38
1
vote
1 answer

Hive Vertex failed, vertexName=Map

Hello All I have getting "Vertex failed, vertexName=Map" error while inserting in to a hive table . . Vertex failed, vertexName=Map 1, vertexId=vertex_1514985864009_4176244_3_00, diagnostics=[Task…
Santhosh Chakka
  • 165
  • 2
  • 4
  • 10
1
vote
1 answer

Hive + Tez :: A join query stuck at last 2 mappers for a long time

I have a views table joining with a temp table with the below parameters intentionally enabled. hive.auto.convert.join=true; hive.execution.engine=tez; The Code Snippet is, CREATE TABLE STG_CONVERSION AS SELECT CONV.CONVERSION_ID, …
1
vote
1 answer

Hive on tez in EMR schedule tasks very slow

I'm trying to use Hive on tez to query orc format data stored in S3. Tez AM scheduled tasks very slow, a lot of Map tasks remained in "PENDING" for a long time. There were enough resources in the cluster (quite enough I would say. There were more…
Harper
  • 81
  • 7
1
vote
2 answers

Executing query with Hive, Tez and offset

I am trying to execute offset query in hive where I am trying to exclude particular records, but always getting the error below: Query: select * from sample order by id limit 1 OFFSET 1; Error: FAILED: ParseException line 1:41 missing EOF at…
whatsinthename
  • 1,828
  • 20
  • 59
1
vote
1 answer

Yarn container size and Tez container management

I have a cluster with yarn resources around 15 TB. I am trying to submit a query through Hive.My default container size on yarn is 4GB. No of mappers assign for that query is around 1000. I have been assigned a total of 10 % resources in my yarn…
user3148326
  • 121
  • 1
  • 7