Questions tagged [apache-tez]

The Apache Tez project is aimed at building an application framework which allows for a complex directed-acyclic-graph of tasks for processing data.

The Apache Tez project is aimed at building an application framework which allows for a complex directed-acyclic-graph of tasks for processing data. It is currently built atop Apache Hadoop YARN

See Hive-on-Tez configuration properties.

192 questions
1
vote
1 answer

TEZ parameter equivalent to "mapreduce.map.failures.maxpercent"

In previous versions of Hadoop where MR is used we have a set parameter for ignoring certain percentage of vertex failures.In the new versions where MR is deprecated, is there any equivalent parameter for mapreduce.map.failures.maxpercent in TEZ?
Pruthvi
  • 65
  • 1
  • 1
  • 8
1
vote
0 answers

PutHiveQl processor with TEZ as execution engine fail sometime

The error in nifi-app.log : > ERROR [Timer-Driven Process Thread-3] o.apache.nifi.processors.hive.PutHiveQL > PutHiveQL[id=1a203b03-0311-192f-b7a1-9959aea10e41] Failed to update > Hive for >…
maxime G
  • 1,660
  • 1
  • 10
  • 27
1
vote
1 answer

Apache Hive query on Tez FileNotFoundException

I'm receiving this exception when executing a Hive query on Tez with Hive 2.3.6 and Tez 0.9.2 I know Tez is configured correctly because I can manually run map-reduce jobs via Hadoop. Dag submit failed due to java.io.FileNotFoundException: Path is…
Ben DeMott
  • 3,362
  • 1
  • 25
  • 35
1
vote
3 answers

Hive query shows few reducers killed but query is still running. Will the output be proper?

I have a complex query with multiple left outer joins running for the last 1 hour in Amazon AWS EMR. But few reducers are shown as Failed and Killed. My question is why do some reducers get killed? Will the final output be proper?
Srihari Karanth
  • 2,067
  • 2
  • 24
  • 34
1
vote
1 answer

How to change number of mapper with ORC files using tez?

I am trying to increase the number of map task. The file format is ORC and using TEZ for processing. I am having a 2.8 gb files. Approximately 128 MB files and number of files is 29 approx. Every time I execute 28 map task gets executed. I am…
1
vote
1 answer

Diffrence in behaviour while running "count(*) " in Tez and Map reduce

Recently I came across this issue. I had a file at a Hadoop Distributed File System path and related hive table. The table had 30 partitions on both sides. I deleted 5 partitions from HDFS and then executed "msck repair table ;" on…
1
vote
1 answer

Job name for Tez job in beeline and view it in YARN

I'm using Beeline and like to set a specific name for a TEZ job, like I use mapreduce.job.name for a MapReduce job. I tried hive.query.name, but it doesn't make any difference in yarn application -list. Some say we can view the name only in TEZ UI,…
ram_23
  • 79
  • 11
1
vote
1 answer

Why numbers of mappers vary so much in Hive for insert overwrite

I have a transaction enabled table called t1 and I am using Tez as execution engine. I fail to understand why there is huge difference in query plan for these two queries. What should I do to tune this scenario? select t.id, t.id_type from t…
Puneet Khatod
  • 161
  • 1
  • 5
1
vote
1 answer

TEZ mapper resource request

We recently migrated from MapReduce to TEZ for executing Hive queries on EMR. We are seeing cases where for the exact hive query launches very different number of mappers. See Map 3 phase below. On the first run it requested for 305 resources and on…
kvb
  • 625
  • 3
  • 8
  • 12
1
vote
1 answer

how to reduce the number of containers in the query

I have a query using to much containers and to much memory. (97% of the memory used). Is there a way to set the number of containers used in the query and limit the max memory? The query is running on Tez. Thanks in advance
macieira
  • 315
  • 2
  • 7
  • 18
1
vote
1 answer

Query taking time despite adding session settings

Following is the ETL generated query Query - SELECT infaHiveSysTimestamp('SS') as a0, 7991 as a1, single_use_subq30725.a1 as a2, SUBSTR(SUBSTR(single_use_subq30725.a2, 0, 5), 0, 5) as a3, CAST(1 AS SMALLINT) as a4, single_use_subq30725.a3 as a5,…
Kumar
  • 119
  • 10
1
vote
1 answer

hive Query hits the same view multiple times, any optimal way to approach this query

We are supporting an application which are running huge hive queries triggered via ETL tool. The query after the mapping runs on hive. The query is very big but its structure looks like this. INSERT INTO Table2 Select t1.f0,…
Kumar
  • 119
  • 10
1
vote
1 answer

hive analyze query taking lot of time

In order to speed up ETL queries on large tables, we run many analyze queries on these tables and date columns in the evening. but these analyze queries on columns take lot of memory and time. we are using tez. is there any way to optimize analyze…
Kumar
  • 119
  • 10
1
vote
2 answers

An exception thrown out while running an HQL in hive

I want to run a select statement and put the result into a table, I'm sure its not syntax error. HQL: INSERT overwrite table datalake_rci.MID_DealerVehicleOutputValue --MIDDealerVehicleOutputValueID…
user2575502
  • 703
  • 12
  • 28
1
vote
1 answer

Reduce the execution time of large query

My query takes 30+ minutes to process this query. It indeed works on very large data set, however I may be missing something basic which can reduce the execution time. Query works on many reducer stages each using 1000+ reducers. Runs on Tez…
akash sharma
  • 411
  • 2
  • 24