Questions tagged [apache-tez]

The Apache Tez project is aimed at building an application framework which allows for a complex directed-acyclic-graph of tasks for processing data.

The Apache Tez project is aimed at building an application framework which allows for a complex directed-acyclic-graph of tasks for processing data. It is currently built atop Apache Hadoop YARN

See Hive-on-Tez configuration properties.

192 questions

votes

1 answer

Tez container size estimation with respect to input split length

So - when Tez chooses number of mappers to run, it looks at the number of containers which can run in parallel (available slots), a wave factor, rack locality of data, FileInputFormat max split size, Tez max grouping size, stripes which can go into…

asked Sep 25 '20 at 06:52

Run2

1,839
22
32

votes

2 answers

Apache Hive Not Returning YARN Application Results Correctly

I'm running a from-scratch cluster on AWS EC2. I have an external table (partitioned) defined with data on S3. I'm able to query this table and receive results to the console with a simple select * statement: hive> set…

hadoop hive hadoop-yarn apache-tez

asked Nov 05 '19 at 21:59

Aaron Grubb

votes

0 answers

How bucketing helps in case of more than two tables, if at all it does.( Hive Sort Merge Bucket Join)

We are aware of how map join and SMBM join works reducing the execution time( eliminating reduce phase i.e eliminating shuffle). Ex: For join between two tables select a.col1,b.col2 from a join b on a.col1=b.col1 (both the tables are bucketed on…

join hive partitioning apache-tez

asked Jun 17 '19 at 08:28

user3123372

votes

1 answer

Hive Tez reducers are running super slow

I have joined multiple tables and the total no of rows are around 25 billion. On top of that, I am doing aggregation. Here are my hive settings as below, which I am using to generate the final output. I am not really sure how to tune the query and…

hadoop hive query-optimization hiveql apache-tez

asked Feb 02 '19 at 08:12

Teja

13,214
36
93
155

votes

1 answer

ORDER BY statement in Hive on Tez throws OOM Exception

I'm trying to use ORDER BY to find the earliest time an entry has been made in my table in Hive. The statement looks like this SELECT latitude, longitude, timeiss FROM iss ORDER BY timeiss LIMIT 10; This gives me an error message that looks like…

hadoop hive out-of-memory hiveql apache-tez

asked Dec 17 '18 at 14:36

PretendNotToSuck

votes

1 answer

Tez VS Spark - huge performance diffs

I'm using HDP 2.6.4 and am seeing huge differences in Spark SQL vs Hive on TeZ. Here's a simple query on a table of ~95 M rows SELECT DT, Sum(1) from mydata GROUP BY DT DT is partition column, a string that marks date. In spark shell, with 15…

apache-spark hive apache-spark-sql hortonworks-data-platform apache-tez

asked Nov 19 '18 at 11:06

hummingBird

2,495
3
23
43

votes

1 answer

Understanding hive query plan

I have the a query and its associated query and query plan (see gist) for simulated data. The number of rows in the table lte_data_tenmillion is 10000000 The number of rows in the table subscriber data is 100000 For both tables none of the rows have…

hadoop hive hortonworks-data-platform apache-tez

asked Sep 19 '16 at 10:48

Nitin Kumar

votes

1 answer

Is Tez always better than MR as Hive execution engine?

Is it true that generally for smaller queries (expecting result in interactive fashion, in minutes, than hours) Tez performs better and for batch queries (taking hours) MR performs better as an execution engine? Or can we say that irrespective of…

hive mapreduce hiveql apache-tez

asked Aug 24 '16 at 18:58

Dhiraj

3,396
4
41
80

votes

0 answers

DataXceiver error processing WRITE_BLOCK operation

Here's the error I get: 2015-12-11 04:01:47,306 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: anmol-vm1-new:50010:DataXceiver error processing WRITE_BLOCK operation src: /10.0.1.193:57002 dst:…

ubuntu hadoop hdfs hadoop-yarn apache-tez

asked Dec 11 '15 at 04:52

Mona Jalal

34,860
64
239
408

votes

5 answers

Apache Tez build fails

I am trying to build Apache Tez (Both 0.6.1 and 0.7.0 version) for hadoop-2.6.0 in windows using below command mvn clean package -Dhadoop.version=2.6.0 -DskipTests -Dmaven.javadoc.skip But i am getting below exception [INFO] [INFO] ---…

hadoop maven-3 apache-tez

asked Jul 13 '15 at 09:56

Kumar

3,782
4
39
87

votes

1 answer

ORC Split Generation issue with Hive Table

I'm using Hive version 3.1.3 on Hadoop 3.3.4 with Tez 0.9.2. When I create an ORC table that contains splits and try to query it, I get an ORC split generation failed exception. If I concatenate the table, this solves the issue in some cases. In…

hadoop hive orc apache-tez

asked Nov 06 '22 at 13:08

Patrick Tucci

1,824
1
16
22

votes

1 answer

Hive queries taking so long

I have a CDP environment running Hive, for some reason some queries run pretty quickly and others are taking even more than 5 minutes to run, even a regular select current_timestamp or things like that. I see that my cluster usage is pretty low so I…

hive hadoop-yarn apache-tez

asked Jan 29 '22 at 17:00

EvilQ

votes

1 answer

Is there any scenario where we wouldn't want to reuse tez containers?

I started with hive and tez some days back during one of my projects. During that time, I came across this property tez.am.container.reuse.enabled which is recommended to be kept as true by many sites. I understand it's due to : Limiting requests…

hive apache-tez

asked Oct 20 '21 at 16:58

Anshul Dubey

votes

1 answer

hive alter table concatenate command risks

I have been using tez engine to run map reduce jobs. I have a MR job which takes ages to run, because i noticed i have over 20k files with 1 stripe each, and tez does not evenly distributes mappers based on amount of files, rather amount of stripes.…

hive mapreduce concatenation orc apache-tez

asked May 11 '21 at 14:30

9uzman7

votes

1 answer

Hive is not accessible via Spark In Kerberos Environment : Client cannot authenticate via:[TOKEN, KERBEROS]

Hi All, I'm running Spark(2.4.4) in kerberos environment, I've written a code to query Hive Table Via Spark. I am doing kinit also in spark-submit command, but still i'm facing java.io.IOException: org.apache.hadoop.security.AccessControlException:…

apache-spark hadoop hive apache-tez kerberos-delegation

asked Aug 26 '20 at 11:35

Deepak Singh

Prev 1

…

12 13 Next