Questions tagged [tez]

The Apache Tez™ project is aimed at building an application framework which allows for a complex directed-acyclic-graph of tasks for processing data. It is currently built atop Apache Hadoop YARN.

The 2 main design themes for Tez are:

Empowering end users by: Expressive dataflow definition APIs Flexible Input-Processor-Output runtime model Data type agnostic Simplifying deployment Execution Performance Performance gains over Map Reduce Optimal resource management Plan reconfiguration at runtime Dynamic physical data flow decisions

[For more details visit this link][1]: https://tez.apache.org/

94 questions
0
votes
0 answers

Join two large tables in Tez

Are there any recommendations to speed up query performance for joining two very large hive tables (> 2 TB) ? Execution engine used is Tez . Both the tables are unpartitioned and in text format. Cluster is having 64 nodes with 128 GB ram each.
KBR
  • 464
  • 1
  • 7
  • 24
0
votes
1 answer

tez buil error - shufflehandler

I am installing Tez-0.8.4 based on Hadoop 2.6.0-cdh5.10.0. I got this error. Please help me when I tried to build Tez source with maven…
0
votes
1 answer

only 1 map task while extracting data from salesforce

Several Extracts Job in Datameer (Rapid ETL/BI tool, sits on top of hadoop) are reading data out of salesforce objects. The largest extract is 1.4 GB(Task object) and the smallest extract is 96 MB(account object). Datameer uses REST API Based…
Kfactor21
  • 412
  • 5
  • 14
0
votes
0 answers

When does HIVE (not) use WHERE clause on partition as predicate filter

I have two tables, tbl_a and tbl_b, both formatted as ORC, and partitioned on the column dt. One table uses a partition format %Y%m%d%H%M whereas the other table uses %Y%m%d. When I look at the execution plan of a simple SELECT + WHERE statement in…
0
votes
1 answer

Hive 2.1 cannot doesn't evaluate properly NULL map

I'm trying to find an workaround for checking for NULL maps in Hive 2.1. When I'm checking the keys inside the map I get the correct result. Here's what I'm trying to evaluate and also the result: hive> select request_params, request_params is null,…
0
votes
0 answers

Diff between hive with tez and pig with tez

I want to know the difference btw hive with tez and pig with tez. why in hive with tez numbers of maps and reducer used can be seen in the console but in pig we can see only number of tasks launched. why How to calculate reducers launched in pig…
Vijay M
  • 11
  • 1
0
votes
2 answers

Hive Container is running beyond physical limits

I frequently get the error message in hive container running beyond physical memory limits. Current usage: 4.7 GB of 4GB physical memory used; 12.1 GB of 8.4 GB virtual memory used. Killing container I execute this in scala using jdbc connection. My…
0
votes
1 answer

Identify number of Mappers & Reducers launched in a Pig script

I'm using Hortonworks' sandbox VM to run Hadoop services. I executed a Pig filter script in Tez mode. Unlike Hive, the Pig log (console) doesn't show any information about the number of mappers & reducers being executed. Am I looking at a wrong…
DhiwaTdG
  • 748
  • 1
  • 10
  • 26
0
votes
0 answers

How to find the CPU time in Tez query execution?

I am currently testing time execution for tables stored in HDFS using both MapReduce and Tez. In MapReduce, I can easily see the CPU time taken for the process. MapReduce example I am not clear on the results shown for Tez query. Tez example Can…
0
votes
0 answers

Access tez job history

I want to get job information from the job history server for the tez execution engine. Currently all map reduce jobs are reflected on the job history server but not the tez ones. Job history is using some kind of logs to get all the information.…
0
votes
0 answers

Hive create table not insert data

I am running the below hive query. After the mapreduce is complete I see that no data is inserted. create table t_123 as select * from ( select * from t1 union all select * from t2 union all select * from t3 ) X But if i just run the…
user1206412
  • 63
  • 1
  • 10
0
votes
0 answers

Hive on Tez doesn't work on Hue (Error: Client Version = unknown) after upgrading HDP 2.2 to 2.3

I have upgraded HortonWorks from 2.2 to 2.3 for our Hadoop cluster and made all the required changes to Hue (given in HortonWorks documentation) but hue client has following issues when accessing Tez on Hive browser whereas Tez on Hive CLI works…
Abhishek Sakhuja
  • 197
  • 2
  • 16
0
votes
0 answers

Hive OOM error when running a query with tez as execution engine

when I am running the following query, I am getting the error below. insert overwrite table mybug select row_number() over (order by clickstream_key) as key, clickstream_key as data from mytest; It is launching around 2 mapper tasks and 240…
Vinay Kumar
  • 1,664
  • 2
  • 15
  • 19
0
votes
1 answer

When to use Hive engine MR and when to use TEZ?

Under what conditions is it preferable to use the Hive engine TEZ over MR? What are the pro's and con's of each?
Corey
  • 1,845
  • 1
  • 12
  • 23
0
votes
2 answers

Hive table sorted but inserted without sort

what happen if create table X (...) clustered by(date) sorted by (time) but inserted without sort insert into x select * from raw Will data be sorted after fetched from raw before inserted? If unsorted data inserted What does "sorted by" do in…
Jihun No
  • 1,201
  • 1
  • 14
  • 29