Questions tagged [hive-partitions]

To be used for questions regarding partitions in hive.

Partitioning is a way of dividing a table into related parts based on the values of partitioned columns such as date, city, and department. Using partition, it is easy to query a portion of the data.

Partitions are essentially horizontal slices of data which allow larger sets of data to be separated into more manageable chunks. In Hive, partitioning is supported for both managed and external tables in the table definition as seen below.

144 questions

votes

1 answer

How to partition a Hive Table using range of values for a column

I have a Hive Table with 2 columns.Employee ID and Salary. Data is something like given below. Employee ID Salary 1 10000.08 2 20078.67 3 20056.45 4 30000.76 5 10045.14 6 43567.76 I want to create Partitions based on Salary Column.For…

asked Jul 30 '17 at 08:56

Surbhi

votes

1 answer

S3 hive external table on subdirectories is not working

I have following s3 directory structure. Data/ Year=2015/ Month=01/ Day=01/ files Day=02/ files Month=02/ Day=01/ files Day=02/ …

amazon-s3 hive hiveql hive-partitions hiveddl

asked Oct 22 '15 at 10:39

user3313379

votes

1 answer

Performance of pyspark + hive when a table has many partition columns

I am trying to understand the performance impact on the partitioning scheme when Spark is used to query a hive table. As an example: Table 1 has 3 partition columns, and data is stored in paths like year=2021/month=01/day=01/...data... Table 2 has…

apache-spark pyspark hive hive-partitions

asked Dec 19 '21 at 07:34

GeorgeWilson

votes

1 answer

Repartition in Hadoop

My question is mostly theoretical, but i have some tables that already follow some sort of partition scheme, lets say my table is partitioned by day, but after working with the data for sometime we want to modifity to month partitions instead, i…

hadoop hive azure-hdinsight hive-partitions hiveddl

asked Aug 11 '21 at 10:16

frammnm

votes

1 answer

Reg : Efficiency among query optimizers in hive

After reading about query optimization techniques I came to know about the below techniques. 1. Indexing - bitmap and BTree 2. Partitioning 3. Bucketing I got the difference between partitioning and bucketing, and when to use them but I'm still…

hadoop indexing hive hiveql hive-partitions

asked Apr 19 '20 at 06:56

Anand

votes

1 answer

Hive Not Utilizing Partitions in Query

I have a view that works to pull the most recent data for a Hive history table. The history table is partitioned by day. The way that the view works is very straightforward—it has a subquery that does a max date on the date field (the one that is…

hive subquery hive-partitions

asked Sep 11 '19 at 13:40

Jeff E

votes

2 answers

How to insert overwrite partitions only if partitions not exists in HIVE?

How to insert overwrite partitions only if partitions not exists in HIVE? Just as title. I'm working on something that always needs to rewrite hive tables. I have tables that has multiple partitions and I only want to insert new partitions without…

hive hiveql hive-partitions

asked Sep 06 '19 at 03:33

Yang

votes

1 answer

How to insert/copy one partition's data to multiple partitions in hive?

I'm having data of day='2019-01-01' in my hive table, I want to copy same data to whole Jan-2019 month. (i.e. in '2019-01-02', '2019-01-03'...'2019-01-31') I'm trying following but data is only inserted in '2019-01-02' and not in…

hive calendar hiveql date-range hive-partitions

asked May 10 '19 at 05:32

axnet

5,146
3
25
45

votes

1 answer

Insert overwrite on partitioned table is not deleting the existing data

I am trying to run insert overwrite over a partitioned table. The select query of insert overwrite omits one partition completely. Is it the expected behavior? Table definition CREATE TABLE `cities_red`( …

hive hiveql hive-partitions

asked Apr 19 '19 at 08:14

Ayoush Agarwal

votes

3 answers

Hive external table optimal partition size

What is the optimal size for external table partition? I am planning to partition table by year/month/day and we are getting about 2GB of data daily.

hive partitioning create-table partition hive-partitions

asked Jun 01 '16 at 17:52

Igor K.

votes

1 answer

How to get whether the table is partitioned by dynamic or static in hive

Trying to find the list of tables have the dynamic partition in hive , Tried the following command and not getting the clue to find the way, Commands tried show partitions describe formatted

hadoop hive hiveql beeline hive-partitions

asked Apr 26 '16 at 20:53

William R

votes

1 answer

In Foundry, how can I Hive partition with only 1 parquet file per value?

I'm looking to improve the performance on running filtering logic. To accomplish this, the idea is to do hive partitioning setting by setting the partition column to a column in the dataset (called splittable_column). I checked and the cardinality…

pyspark palantir-foundry hive-partitions foundry-code-repositories foundry-code-workbooks

asked Jun 29 '22 at 17:48

Andrew Andrade

2,608
1
17
24

votes

1 answer

How to automatically update the Hive external table metadata partitions for streaming data

I am writing the spark streaming data into hdfs partitions using pyspark. please find the code data = (spark.readStream.format("json").schema(fileSchema).load(inputDirectoryOfJsonFiles)) output = (data.writeStream .format("parquet") …

apache-spark pyspark hive spark-streaming hive-partitions

asked Feb 13 '22 at 18:13

nani

votes

1 answer

Querying based on Partition and non-partition column in Hive

I have an external Hive table as follows :- CREATE external TABLE sales ( ItemNbr STRING, itemShippedQty INT, itemDeptNbr SMALLINT, gateOutUserId STRING, code VARCHAR(3), trackingId STRING, baseDivCode STRING ) PARTITIONED BY (countryCode STRING,…

hive parquet hadoop-partitioning hive-partitions

asked Jul 24 '21 at 18:06

Neer1009

votes

1 answer

Hive: read table partitions defined in subselect

I have a Hive table which is partitioned by partitionDate field. I can read partition of my choice via simple select * from myTable where partitionDate = '2000-01-01' My task is to specify the partition of my choise dynamically. I.e. first I want…

sql hive query-optimization partition hive-partitions

asked Jul 21 '21 at 06:53

MiamiBeach

3,261
6
28
54

Prev 1

…

9 10 Next