Questions tagged [apache-hive]

Apache Hive supports analysis of large datasets stored in Hadoop's HDFS and compatible file systems such as Amazon S3 filesystem. It provides an SQL-like language called HiveQL with schema on read and transparently converts queries to map/reduce, Apache Tez[7] and Spark jobs. All three execution engines can run in Hadoop YARN. To accelerate queries, it provides indexes, including bitmap indexes.

Few features:-

1.Indexing to provide acceleration, index type including compaction and Bitmap index as of 0.10, more index types are planned. 2.Different storage types such as plain text, RCFile, HBase, ORC, and others. 3.Metadata storage in an RDBMS, significantly reducing the time to perform semantic checks during query execution. 4.Operating on compressed data stored into the Hadoop ecosystem using algorithms including DEFLATE, BWT, snappy, etc. 5.Built-in user defined functions (UDFs) to manipulate dates, strings, and other data-mining tools. Hive supports extending the UDF set to handle use-cases not supported by built-in functions. 6.SQL-like queries (HiveQL), which are implicitly converted into MapReduce or Tez, or Spark jobs.

96 questions
0
votes
0 answers

Installing Apache Hive on Windows without using any virtual machine

Recently I started to learn about Hive. So I wanted to try hands on but the problem is that I am not getting any tutorial to install hive on windows machine. Constraint i have is that- 1. Cannot install Linux in my machine on side with windows as a…
Supradeep
  • 161
  • 1
  • 11
0
votes
1 answer

Apache Spark's deployment issue (cluster-mode) with Hive

EDIT: I'm developing a Spark application that reads a data from the multiple structured schemas and I'm trying to aggregate the information from those schemas. My application runs well when I run it locally. But when I run it on a cluster, I'm…
accssharma
  • 157
  • 1
  • 10
0
votes
1 answer

HIVE/HiveQL Get Max count

Sample Data DATE WindDirection 1/1/2000 SW 1/2/2000 SW 1/3/2000 SW 1/4/2000 NW 1/5/2000 NW Question below Every day is unqiue, and wind direction is not unique, So now we are trying to get the COUNT of the most…
dedpo
  • 482
  • 11
  • 30
0
votes
1 answer

trying to find the max of select statement in HIVE

I am trying to yield a the top person by weight in the below script. I have a working version way below which returns Matt Holiday with 250 as weight, and now that is all i want The player with Max weight and him only not anyone else SELECT DISTINCT…
dedpo
  • 482
  • 11
  • 30
0
votes
0 answers

GenericUDF of hive execute twice on Spark

Hello i facing some problem with creating genericUDF of hive and register as temporary function but when i call it its call twice see code given below i create a genericUDF with following code class GenUDF extends GenericUDF{ var queryOI:…
Sandeep Purohit
  • 3,652
  • 18
  • 22
0
votes
1 answer

Joining one table two times in hive

I am not getting any idea how to implement it in Hive. Please suggest the way. Assume I have hive tables like this Table1: id | primary | secondary ------------------------- 1 | A | [B,C] 2 | B | [A] 3 | C | [A,B] Table2 id |…
Santhosh Tangudu
  • 759
  • 9
  • 19
0
votes
1 answer

Save hive table after performing simple HQL on different hive cluster without export+distcp+import

I have a table A in cluster X. I want to perform some HQL (say select * from A where A.country = 'INDIA') & save output in table B in cluster Y. I can perform HQL on table A & store data in table temp. Then, export this hive table to table B in…
Dev
  • 13,492
  • 19
  • 81
  • 174
0
votes
1 answer

simple JSON file analysing in Hive-0.14 using serde

I am trying to execute hive commands on json file using jsonserde's,but I am always getting null values ,but not actual data. I have used serde's provided in "code.google.com/p/hive-json-serde/downloads/list" link. I have tried multiple ways but all…
sanumala
  • 201
  • 1
  • 5
  • 16
0
votes
0 answers

how to get mapreduce job number from hive server

if use hive cli. the log is : Total MapReduce jobs = 1 Stage-1 is selected by condition resolver. Launching Job 1 out of 1 but in hive server or beeline. the log is : INFO : Stage-1 is selected by condition resolver. INFO : Number of reduce tasks…
Willow
  • 3
  • 1
0
votes
0 answers

Apache Hive and record updates

I have streaming data coming into my consumer app that I ultimately want to show up in Hive/Impala. One way would be to use Hive based APIs to insert the updates in batches to the Hive Table. The alternate approach is to write the data directly…
Neel
  • 9,913
  • 16
  • 52
  • 74
0
votes
1 answer

Need Alternative query in Apache Hive limit

I need alternative query for the below query. Select a.name,max(a.cnt) from (Select name,count(name) as cnt from candidate group by name) a group by a.name order by 2 desc limit 1; drop table if exists candidate; create external table…
Venkadesh Venkat
  • 175
  • 2
  • 7
  • 17
0
votes
1 answer

Hive query counts of fields where fields are populated

I have a huge Hive table consisting of ten product fields, date fields for the purchases, and an identifier. The product fields are named like prod1, prod2, ... , prod10 and refer to the last ten products purchased. For most IDs, we don't have…
economy
  • 4,035
  • 6
  • 29
  • 37
0
votes
0 answers

Hive JDBC MapRedTask Failed

I written a Java code to access Apache Hive tables. import java.sql.*; public class HiveQL { private static String drivername = "org.apache.hive.jdbc.HiveDriver"; public static void main(String[] args) throws SQLException { …
0
votes
2 answers

How to handle newline character in HIVE on HBase?

I am inserting data into hbase from my java program. As we need to convert everything into byte arrays to insert into hbase I am doing so. But when there is any newline character in my input string, it is storing hexadecimal values in hbase (Eg: I…
prasad
  • 339
  • 8
  • 23
0
votes
1 answer

User Concurrency is not working in Spark for hive

I have configured 3 node Spark (version 1.4.0) cluster environment with Hive 0.13.1 version. and started Spark thrift service using ./sbin/start-thriftserver.sh. Multiple users are using same thrift service with same port and different…
Kaushal
  • 3,237
  • 3
  • 29
  • 48