Questions tagged [apache-tez]

The Apache Tez project is aimed at building an application framework which allows for a complex directed-acyclic-graph of tasks for processing data.

The Apache Tez project is aimed at building an application framework which allows for a complex directed-acyclic-graph of tasks for processing data. It is currently built atop Apache Hadoop YARN

See Hive-on-Tez configuration properties.

192 questions
3
votes
1 answer

Tez container size estimation with respect to input split length

So - when Tez chooses number of mappers to run, it looks at the number of containers which can run in parallel (available slots), a wave factor, rack locality of data, FileInputFormat max split size, Tez max grouping size, stripes which can go into…
Run2
  • 1,839
  • 22
  • 32
3
votes
2 answers

Apache Hive Not Returning YARN Application Results Correctly

I'm running a from-scratch cluster on AWS EC2. I have an external table (partitioned) defined with data on S3. I'm able to query this table and receive results to the console with a simple select * statement: hive> set…
3
votes
0 answers

How bucketing helps in case of more than two tables, if at all it does.( Hive Sort Merge Bucket Join)

We are aware of how map join and SMBM join works reducing the execution time( eliminating reduce phase i.e eliminating shuffle). Ex: For join between two tables select a.col1,b.col2 from a join b on a.col1=b.col1 (both the tables are bucketed on…
user3123372
  • 704
  • 1
  • 10
  • 26
3
votes
1 answer

Hive Tez reducers are running super slow

I have joined multiple tables and the total no of rows are around 25 billion. On top of that, I am doing aggregation. Here are my hive settings as below, which I am using to generate the final output. I am not really sure how to tune the query and…
Teja
  • 13,214
  • 36
  • 93
  • 155
3
votes
1 answer

ORDER BY statement in Hive on Tez throws OOM Exception

I'm trying to use ORDER BY to find the earliest time an entry has been made in my table in Hive. The statement looks like this SELECT latitude, longitude, timeiss FROM iss ORDER BY timeiss LIMIT 10; This gives me an error message that looks like…
PretendNotToSuck
  • 384
  • 2
  • 10
3
votes
1 answer

Tez VS Spark - huge performance diffs

I'm using HDP 2.6.4 and am seeing huge differences in Spark SQL vs Hive on TeZ. Here's a simple query on a table of ~95 M rows SELECT DT, Sum(1) from mydata GROUP BY DT DT is partition column, a string that marks date. In spark shell, with 15…
3
votes
1 answer

Understanding hive query plan

I have the a query and its associated query and query plan (see gist) for simulated data. The number of rows in the table lte_data_tenmillion is 10000000 The number of rows in the table subscriber data is 100000 For both tables none of the rows have…
Nitin Kumar
  • 765
  • 1
  • 11
  • 26
3
votes
1 answer

Is Tez always better than MR as Hive execution engine?

Is it true that generally for smaller queries (expecting result in interactive fashion, in minutes, than hours) Tez performs better and for batch queries (taking hours) MR performs better as an execution engine? Or can we say that irrespective of…
Dhiraj
  • 3,396
  • 4
  • 41
  • 80
3
votes
0 answers

DataXceiver error processing WRITE_BLOCK operation

Here's the error I get: 2015-12-11 04:01:47,306 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: anmol-vm1-new:50010:DataXceiver error processing WRITE_BLOCK operation src: /10.0.1.193:57002 dst:…
Mona Jalal
  • 34,860
  • 64
  • 239
  • 408
3
votes
5 answers

Apache Tez build fails

I am trying to build Apache Tez (Both 0.6.1 and 0.7.0 version) for hadoop-2.6.0 in windows using below command mvn clean package -Dhadoop.version=2.6.0 -DskipTests -Dmaven.javadoc.skip But i am getting below exception [INFO] [INFO] ---…
Kumar
  • 3,782
  • 4
  • 39
  • 87
2
votes
1 answer

ORC Split Generation issue with Hive Table

I'm using Hive version 3.1.3 on Hadoop 3.3.4 with Tez 0.9.2. When I create an ORC table that contains splits and try to query it, I get an ORC split generation failed exception. If I concatenate the table, this solves the issue in some cases. In…
Patrick Tucci
  • 1,824
  • 1
  • 16
  • 22
2
votes
1 answer

Hive queries taking so long

I have a CDP environment running Hive, for some reason some queries run pretty quickly and others are taking even more than 5 minutes to run, even a regular select current_timestamp or things like that. I see that my cluster usage is pretty low so I…
EvilQ
  • 23
  • 4
2
votes
1 answer

Is there any scenario where we wouldn't want to reuse tez containers?

I started with hive and tez some days back during one of my projects. During that time, I came across this property tez.am.container.reuse.enabled which is recommended to be kept as true by many sites. I understand it's due to : Limiting requests…
Anshul Dubey
  • 117
  • 10
2
votes
1 answer

hive alter table concatenate command risks

I have been using tez engine to run map reduce jobs. I have a MR job which takes ages to run, because i noticed i have over 20k files with 1 stripe each, and tez does not evenly distributes mappers based on amount of files, rather amount of stripes.…
9uzman7
  • 409
  • 8
  • 19
2
votes
1 answer

Hive is not accessible via Spark In Kerberos Environment : Client cannot authenticate via:[TOKEN, KERBEROS]

Hi All, I'm running Spark(2.4.4) in kerberos environment, I've written a code to query Hive Table Via Spark. I am doing kinit also in spark-submit command, but still i'm facing java.io.IOException: org.apache.hadoop.security.AccessControlException:…
1
2
3
12 13