Questions tagged [hive-partitions]

To be used for questions regarding partitions in hive.

Partitioning is a way of dividing a table into related parts based on the values of partitioned columns such as date, city, and department. Using partition, it is easy to query a portion of the data.

Partitions are essentially horizontal slices of data which allow larger sets of data to be separated into more manageable chunks. In Hive, partitioning is supported for both managed and external tables in the table definition as seen below.

144 questions
2
votes
2 answers

How read data partitons in S3 from Trino

I'm trying to read data partitons in S3 from Trino. What I did exactly: I uploaded my data with all partitions into S3. I have a specified avro schema, I put it in file local system. Then I created an external hive table to point to the data…
Isabelle
  • 151
  • 2
  • 9
2
votes
3 answers

Add New Partition to Hive External Table via databricks

I have a Folder which previously had subfolders based on ingestiontime which is also the original PARTITION used in its Hive Table. So the Folder Looks as…
2
votes
1 answer

Hive ALTER command to drop partition having values older than 24 months

I have a hive table(consumer_data) with partition column 'val_dt' which is a string column having values in the date format 'yyyy-MM'. I have multiple partitions in the table, from '2015-01' to '2020-04'. Each month the data is added incrementally…
Ajay
  • 23
  • 3
2
votes
1 answer

How to getting latest partition data from hive

I need to fetch all records from a table in hive which is having latest partitions. The table is partitioned by date,year,month eg (date=25,year=2020,month=3), likewise there will be many partitions. The partitions are not static and it will be…
2
votes
1 answer

why do need to set propties for Dynamic partition in hive

I would like to know one thing in hive dynamic partition. While doing dynamic partitions we have to set following properties SET hive.exec.dynamic.partition = true; SET hive.exec.dynamic.partition.mode = nonstrict; Without those properties we…
2
votes
1 answer

get latest data from hive table with multiple partition columns

I have a hive table with below structure ID string, Value string, year int, month int, day int, hour int, minute int This table is refreshed every 15 mins and it is partitioned with year/month/day/hour/minute columns. Please find below samples on…
learn_more
  • 189
  • 3
  • 13
2
votes
2 answers

Spark (EMR) Partition Pruning Behavior for Multi-Level Partitioned Table

If I have a table created with multi-level partitions i.e. comprising of two columns (state, city) as follows: state=CA,city=Anaheim state=Texas,city=Houston state=Texas,city=Dallas state=Texas,city=Austin …
rh979
  • 657
  • 1
  • 5
  • 13
2
votes
1 answer

How to create partitioned hive table on dynamic hdfs directories

I am having difficulty in getting hive to discover partitions which are created in HDFS Here's the directory structure in…
guru107
  • 1,053
  • 1
  • 11
  • 28
2
votes
1 answer

Why select distinct partitioned column is very slow?

I hava a table zhihu_answer_increment, it was partitioned by column ym. When I execute query select distinct(ym) from zhihu.zhihu_answer_increment;, it took over 1 min to finish. During the process, hive launched a map-reduce job. here is the…
DennisLi
  • 3,915
  • 6
  • 30
  • 66
2
votes
1 answer

Adding partitions to the external table in hive takes a lot of time

I would like to know what is the best possible way(s) of adding partitions to the external table. I have a external table on S3 in hive with the partition as vehicle=/date=/hr= Now new vehicle can be added at any time of day and there will be…
Nipun
  • 4,119
  • 5
  • 47
  • 83
2
votes
2 answers

Automatically Updating a Hive View Daily

I have a requirement I want to meet. I need to sqoop over data from a DB to Hive. I am sqooping on a daily basis since this data is updated daily. This data will be used as lookup data from a spark consumer for enrichment. We want to keep a history…
binaryhex
  • 133
  • 4
  • 13
2
votes
1 answer

Hive: Add partitions for existing folder structure

I have a folder structure in HDFS like below. However, no partitions were actually created on the table using the ALTER TABLE ADD PARTITION commands, even though the folder structure was setup as if the table had partitions. How can I automatically…
lifebythedrop
  • 401
  • 3
  • 18
2
votes
1 answer

hive drop all partitions keep recent 4 days paritions

I have a table with partitions like below : TABLE logs PARTITION(year = 2019, month = 06, day = 18) partitions 'year', 'month' and 'day' are in string format. I need to drop partitions keeping last seven days partitions. and need to run the job…
Deamon
  • 45
  • 4
2
votes
1 answer

How do we drop partitions in hive with regex. Is it possible?

I am trying to run the following alter table historical_data drop partition (my_date not rlike '[A-Za-z]'); Which gives me an Exception org.apache.hadoop.hive.ql.parse.ParseException: line 2:69 mismatched input 'not' expecting set null in drop…
Akshay Hazari
  • 3,186
  • 4
  • 48
  • 84
2
votes
1 answer

How do I make it such that the result of a query is partitioned as the input?

I'm a newbie on hive, so a basic question: How do I create a query such that the result of that query is partitioned in a specific way? For example: CREATE TABLE IF NOT EXISTS tbl_x ( x SMALLINT, y FLOAT) PARTITIONED BY (id SMALLINT) ROW FORMAT…
Leo Barlach
  • 480
  • 3
  • 13
1 2
3
9 10