Questions tagged [hive-partitions]

To be used for questions regarding partitions in hive.

Partitioning is a way of dividing a table into related parts based on the values of partitioned columns such as date, city, and department. Using partition, it is easy to query a portion of the data.

Partitions are essentially horizontal slices of data which allow larger sets of data to be separated into more manageable chunks. In Hive, partitioning is supported for both managed and external tables in the table definition as seen below.

144 questions
4
votes
1 answer

How to partition a Hive Table using range of values for a column

I have a Hive Table with 2 columns.Employee ID and Salary. Data is something like given below. Employee ID Salary 1 10000.08 2 20078.67 3 20056.45 4 30000.76 5 10045.14 6 43567.76 I want to create Partitions based on Salary Column.For…
Surbhi
  • 43
  • 1
  • 3
4
votes
1 answer

S3 hive external table on subdirectories is not working

I have following s3 directory structure. Data/ Year=2015/ Month=01/ Day=01/ files Day=02/ files Month=02/ Day=01/ files Day=02/ …
user3313379
  • 459
  • 10
  • 21
3
votes
1 answer

Performance of pyspark + hive when a table has many partition columns

I am trying to understand the performance impact on the partitioning scheme when Spark is used to query a hive table. As an example: Table 1 has 3 partition columns, and data is stored in paths like year=2021/month=01/day=01/...data... Table 2 has…
GeorgeWilson
  • 562
  • 6
  • 17
3
votes
1 answer

Repartition in Hadoop

My question is mostly theoretical, but i have some tables that already follow some sort of partition scheme, lets say my table is partitioned by day, but after working with the data for sometime we want to modifity to month partitions instead, i…
frammnm
  • 537
  • 1
  • 5
  • 17
3
votes
1 answer

Reg : Efficiency among query optimizers in hive

After reading about query optimization techniques I came to know about the below techniques. 1. Indexing - bitmap and BTree 2. Partitioning 3. Bucketing I got the difference between partitioning and bucketing, and when to use them but I'm still…
Anand
  • 361
  • 1
  • 9
  • 23
3
votes
1 answer

Hive Not Utilizing Partitions in Query

I have a view that works to pull the most recent data for a Hive history table. The history table is partitioned by day. The way that the view works is very straightforward—it has a subquery that does a max date on the date field (the one that is…
Jeff E
  • 31
  • 2
3
votes
2 answers

How to insert overwrite partitions only if partitions not exists in HIVE?

How to insert overwrite partitions only if partitions not exists in HIVE? Just as title. I'm working on something that always needs to rewrite hive tables. I have tables that has multiple partitions and I only want to insert new partitions without…
Yang
  • 754
  • 2
  • 8
  • 22
3
votes
1 answer

How to insert/copy one partition's data to multiple partitions in hive?

I'm having data of day='2019-01-01' in my hive table, I want to copy same data to whole Jan-2019 month. (i.e. in '2019-01-02', '2019-01-03'...'2019-01-31') I'm trying following but data is only inserted in '2019-01-02' and not in…
axnet
  • 5,146
  • 3
  • 25
  • 45
3
votes
1 answer

Insert overwrite on partitioned table is not deleting the existing data

I am trying to run insert overwrite over a partitioned table. The select query of insert overwrite omits one partition completely. Is it the expected behavior? Table definition CREATE TABLE `cities_red`( …
3
votes
3 answers

Hive external table optimal partition size

What is the optimal size for external table partition? I am planning to partition table by year/month/day and we are getting about 2GB of data daily.
Igor K.
  • 915
  • 2
  • 12
  • 22
3
votes
1 answer

How to get whether the table is partitioned by dynamic or static in hive

Trying to find the list of tables have the dynamic partition in hive , Tried the following command and not getting the clue to find the way, Commands tried show partitions describe formatted
William R
  • 739
  • 2
  • 13
  • 34
2
votes
1 answer

In Foundry, how can I Hive partition with only 1 parquet file per value?

I'm looking to improve the performance on running filtering logic. To accomplish this, the idea is to do hive partitioning setting by setting the partition column to a column in the dataset (called splittable_column). I checked and the cardinality…
2
votes
1 answer

How to automatically update the Hive external table metadata partitions for streaming data

I am writing the spark streaming data into hdfs partitions using pyspark. please find the code data = (spark.readStream.format("json").schema(fileSchema).load(inputDirectoryOfJsonFiles)) output = (data.writeStream .format("parquet") …
2
votes
1 answer

Querying based on Partition and non-partition column in Hive

I have an external Hive table as follows :- CREATE external TABLE sales ( ItemNbr STRING, itemShippedQty INT, itemDeptNbr SMALLINT, gateOutUserId STRING, code VARCHAR(3), trackingId STRING, baseDivCode STRING ) PARTITIONED BY (countryCode STRING,…
Neer1009
  • 304
  • 1
  • 5
  • 18
2
votes
1 answer

Hive: read table partitions defined in subselect

I have a Hive table which is partitioned by partitionDate field. I can read partition of my choice via simple select * from myTable where partitionDate = '2000-01-01' My task is to specify the partition of my choise dynamically. I.e. first I want…
MiamiBeach
  • 3,261
  • 6
  • 28
  • 54
1
2
3
9 10