I have a Hive table partitioned by Year/Month and it contains data for at least 7 years. What I want to do it compress the latest data (like upto 1 year old) through Snappy but the older data through a better compression technique like gzip etc. How…
i have a question about hive mapjoin , i know when a small table join big table , using mapjoin is better, but when i got a sql like this
select a.col1,
a.col2,
a.col3,
/* there has many columns from table a, ignore..*/
…
Many users are used to using 'select * from tables' in oracle/mysql
But i should not allow them to query like that in hive
Is there any way to prevent a full_table scan in hive?
Like trigger or somethng else?
Thanks a lot!
I want to alter 1000s table in hive database, but some of their tables exits some doesn't. As I execute that .sql file, as soon as it found table not present, it exits from hive. so help me out to override or skip those queries whose table is not…
If I have a query in hive which employs JOIN, lets say a LEFT OUTER JOIN or an INNER JOIN on two tables ON any column, then how do I know which type of JOIN is it getting converted into in the back-end MapReduce (i.e. Map-side JOIN or Reduce-side…
It is possible to enable Fetch task in Hive for simple query instead of Map or MapReduce using hive hive.fetch.task.conversion parameter.
Please explain why Fetch task is running much faster than Map especially when doing some simple work (for…
I'm trying to run a Hive query using Amazon EMR, and am trying to get Apache Tez to work with it too, which from what I understand requires setting the hive.execution.engine property to tez according to the hive site?
I get that hive properties can…
I am setting the following property in hive-site.xml:
hive.exec.dynamic.partition.modenonstrict
However in hive console if I run, show conf "hive.exec.dynamic.partition.mode";, I get strict…
I want Hive to automatically acquire kerberos ticket whenever hive(More specifically hive-shell not hive-server) is executed and also renew it automatically in between if job run more then timeout of ticket.
I found similar functionality in Pig. See…
Hi I am new to Hive and kerberos.
I have some hive jobs which run more then life time of ticket. how can I configure hive so that when I start hive shell if ticket is not cached it automatically request for ticket. After acquiring ticket lets…
I have this question: Show the top 5 game Disciplines for the countries who got more than 10 gold medals.
my code is: select distinct t.discipline, m.team from teams t join medals m on (t.noc=m.team and m.numbergold>10) order by m.team;
cloud…
It is my maiden voyage into Hive.
I have multiple Hive tables, like snapshots with names as follows:
revenue_20110131
reveue_20110228
revenue_20110331
purchases_qrt1
purchases_qrt2
purchases_qrt3
purchases_qrt4
I have a lot of such snapshot…
How to avoid user from overriding the default property of hadoop configuration file when submitting hive jobs?
Exmaple:
mapred-site.xml:
mapreduce.job.heap.memory-mb.ratio0.8
User use…
There is a property in pig named
'pig.maxCombinedSplitSize' – Specifies the size, in bytes, of data to be processed by a single map. Smaller files are combined until this size is reached.
Is there a similar property in hive for specifying the size…