Questions tagged [hive-partitions]

To be used for questions regarding partitions in hive.

Partitioning is a way of dividing a table into related parts based on the values of partitioned columns such as date, city, and department. Using partition, it is easy to query a portion of the data.

Partitions are essentially horizontal slices of data which allow larger sets of data to be separated into more manageable chunks. In Hive, partitioning is supported for both managed and external tables in the table definition as seen below.

144 questions

votes

2 answers

How read data partitons in S3 from Trino

I'm trying to read data partitons in S3 from Trino. What I did exactly: I uploaded my data with all partitions into S3. I have a specified avro schema, I put it in file local system. Then I created an external hive table to point to the data…

asked Jan 28 '21 at 19:40

Isabelle

votes

3 answers

Add New Partition to Hive External Table via databricks

I have a Folder which previously had subfolders based on ingestiontime which is also the original PARTITION used in its Hive Table. So the Folder Looks as…

apache-spark amazon-s3 hive databricks hive-partitions

asked Jul 13 '20 at 10:52

Golokesh Patra

votes

1 answer

Hive ALTER command to drop partition having values older than 24 months

I have a hive table(consumer_data) with partition column 'val_dt' which is a string column having values in the date format 'yyyy-MM'. I have multiple partitions in the table, from '2015-01' to '2020-04'. Each month the data is added incrementally…

shell hive hiveql beeline hive-partitions

asked May 09 '20 at 05:05

Ajay

votes

1 answer

How to getting latest partition data from hive

I need to fetch all records from a table in hive which is having latest partitions. The table is partitioned by date,year,month eg (date=25,year=2020,month=3), likewise there will be many partitions. The partitions are not static and it will be…

hive hiveql hadoop-streaming hive-partitions

asked Mar 24 '20 at 10:28

Anusha Radhakrishnan

votes

1 answer

why do need to set propties for Dynamic partition in hive

I would like to know one thing in hive dynamic partition. While doing dynamic partitions we have to set following properties SET hive.exec.dynamic.partition = true; SET hive.exec.dynamic.partition.mode = nonstrict; Without those properties we…

apache hive hive-partitions hive-configuration

asked Feb 04 '20 at 12:16

nanda kumar

votes

1 answer

get latest data from hive table with multiple partition columns

I have a hive table with below structure ID string, Value string, year int, month int, day int, hour int, minute int This table is refreshed every 15 mins and it is partitioned with year/month/day/hour/minute columns. Please find below samples on…

performance hive hiveql partition hive-partitions

asked Jan 10 '20 at 01:58

learn_more

votes

2 answers

Spark (EMR) Partition Pruning Behavior for Multi-Level Partitioned Table

If I have a table created with multi-level partitions i.e. comprising of two columns (state, city) as follows: state=CA,city=Anaheim state=Texas,city=Houston state=Texas,city=Dallas state=Texas,city=Austin …

apache-spark hive apache-spark-sql amazon-emr hive-partitions

asked Nov 12 '19 at 05:31

rh979

votes

1 answer

How to create partitioned hive table on dynamic hdfs directories

I am having difficulty in getting hive to discover partitions which are created in HDFS Here's the directory structure in…

hadoop hive create-table hive-partitions hiveddl

asked Oct 25 '19 at 11:40

guru107

1,053
1
11
28

votes

1 answer

Why select distinct partitioned column is very slow?

I hava a table zhihu_answer_increment, it was partitioned by column ym. When I execute query select distinct(ym) from zhihu.zhihu_answer_increment;, it took over 1 min to finish. During the process, hive launched a map-reduce job. here is the…

performance hive query-optimization hive-partitions

asked Oct 15 '19 at 03:46

DennisLi

3,915
6
30
66

votes

1 answer

Adding partitions to the external table in hive takes a lot of time

I would like to know what is the best possible way(s) of adding partitions to the external table. I have a external table on S3 in hive with the partition as vehicle=/date=/hr= Now new vehicle can be added at any time of day and there will be…

hive partition hive-partitions hiveddl

asked Sep 11 '19 at 05:11

Nipun

4,119
5
47
83

votes

2 answers

Automatically Updating a Hive View Daily

I have a requirement I want to meet. I need to sqoop over data from a DB to Hive. I am sqooping on a daily basis since this data is updated daily. This data will be used as lookup data from a spark consumer for enrichment. We want to keep a history…

apache-spark hadoop hive hive-partitions

asked Aug 05 '19 at 21:41

binaryhex

votes

1 answer

Hive: Add partitions for existing folder structure

I have a folder structure in HDFS like below. However, no partitions were actually created on the table using the ALTER TABLE ADD PARTITION commands, even though the folder structure was setup as if the table had partitions. How can I automatically…

hadoop hive hdfs partitioning hive-partitions

asked Jul 15 '19 at 03:04

lifebythedrop

votes

1 answer

hive drop all partitions keep recent 4 days paritions

I have a table with partitions like below : TABLE logs PARTITION(year = 2019, month = 06, day = 18) partitions 'year', 'month' and 'day' are in string format. I need to drop partitions keeping last seven days partitions. and need to run the job…

hive hiveql hive-partitions hiveddl

asked Jun 18 '19 at 09:15

Deamon

votes

1 answer

How do we drop partitions in hive with regex. Is it possible?

I am trying to run the following alter table historical_data drop partition (my_date not rlike '[A-Za-z]'); Which gives me an Exception org.apache.hadoop.hive.ql.parse.ParseException: line 2:69 mismatched input 'not' expecting set null in drop…

regex hive hive-partitions hiveddl

asked Apr 25 '19 at 11:18

Akshay Hazari

3,186
4
48
84

votes

1 answer

How do I make it such that the result of a query is partitioned as the input?

I'm a newbie on hive, so a basic question: How do I create a query such that the result of that query is partitioned in a specific way? For example: CREATE TABLE IF NOT EXISTS tbl_x ( x SMALLINT, y FLOAT) PARTITIONED BY (id SMALLINT) ROW FORMAT…

hive hiveql create-table hive-partitions hiveddl

asked Apr 04 '19 at 22:47

Leo Barlach

Prev 1 2

…

9 10 Next