Questions tagged [hive-partitions]

To be used for questions regarding partitions in hive.

Partitioning is a way of dividing a table into related parts based on the values of partitioned columns such as date, city, and department. Using partition, it is easy to query a portion of the data.

Partitions are essentially horizontal slices of data which allow larger sets of data to be separated into more manageable chunks. In Hive, partitioning is supported for both managed and external tables in the table definition as seen below.

144 questions
2
votes
1 answer

How do I partition a table by all values?

I have an external table, now I want to add partitions to it. I have 224 unique city id's and I want to just write alter table my_table add partition (cityid) location /path; but hive complains, saying that I don't provide anything for the city id…
pavel_orekhov
  • 1,657
  • 2
  • 15
  • 37
2
votes
1 answer

Hive - fetch only the latest partition of one or more hive tables

I have three partitioned tables (Yearly Partitions) in Hive and all the tables have multiple partitions. I am joining all these three tables as part of my requirement. Now I want to run this sql only for the latest partition and not for the ones…
JKC
  • 2,498
  • 6
  • 30
  • 56
2
votes
1 answer

unable to delete hive table partition contains special character Equal sign(=)

inserted data in Hive table with partition column(CL) value as ('CL=18') which stored as /db/tbname/CL=CL%3D18 (invalid partition contains url encoded special character for equal sign). As per hortonworks community , it was mentioned hive stored…
Girish501
  • 143
  • 9
2
votes
2 answers

hive add partition statement ignores leading zero

I've folder on hdfs /user/test/year=2016/month=04/dt=25/000000_0 Need to add this above partition path to a test table. command : ALTER TABLE test ADD IF NOT EXISTS PARTITION (year=2016,month=04,dt=25) But this add partition command is ignoring the…
pramav
  • 473
  • 3
  • 11
  • 20
1
vote
1 answer

Hive insert into partitioned table with colums list from select

I want to insert into a partitioned Hive table tb_1(a, b, c, d, p1) only columns (a, b) from a select statement. Ex: insert into table tb_1 partition (p1) (a, b) select a, b from tb_2; How can I achieve this?
1
vote
1 answer

Databricks / Spark storage mechanism for Delta Tables, Delta Logs, Partitions etc

I am trying to understand how data is stored and managed in the DataBricks environment. I have a fairly decent understanding of what is going on under the hood but have seen some conflicting information online, therefore would like to get a detailed…
1
vote
1 answer

Hive - incomplete rows in select from managed partitioned table

I need to copy data from a CSV file to a managed partitioned table in Hive. CSV file rows are: id,nome,cognome,ruolo 16,Mike,Maignan,Portiere 23,Fikayo,Tomori,Centrale 24,Simon,Kjaer,Centrale 19,Theo,Hernandez,Terzino ... ------- I…
Moreno
  • 25
  • 2
1
vote
1 answer

How to retain last N partitions for a hive external table?

I need to retain say last 7 partitions and data of a given hive external table. This can be either done via a shell script or a hive hql script. The table is partitioned by intgestion_date=YYYY-MM-DD what would be the best way to find the cutoff…
1
vote
1 answer

Filtering is supported only on partition keys of type string Hive

We recently started facing issues with spark 2.4.4 with hive 1.2.1 version. when we are trying to read data from a table which is partition by string type columns spark.sql("select count(*) from table where #conditions"); Caused by:…
kavetiraviteja
  • 2,058
  • 1
  • 15
  • 35
1
vote
1 answer

Data Loaded wrongly into Hive Partitioned table after adding a new column using ALTER

I already have a Hive partitioned table. I needed to add a new column to the table, so i used ALTER to add the column like below. ALTER TABLE TABLE1 ADD COLUMNS(COLUMN6 STRING); I have my final table load query like this: INSERT OVERWRITE table…
jahan
  • 103
  • 4
  • 19
1
vote
0 answers

Hive Query is going for full table scan when filtering on the partitions from the results of subquery/joins

Why query1 runs for longer time compared to Query2. Hive Source Table Details Columns - 166 Partitions columns - 2 columns (all are int datatypes) Number of Partitions - 3211 partitions Total Records - 19374461064 File Format - ORC --…
Vijiy
  • 1,187
  • 6
  • 21
1
vote
1 answer

Hive load multiple partitioned HDFS file to table

I have some twice-partitioned files in HDFS with the following…
hunterm
  • 247
  • 5
  • 11
1
vote
1 answer

how to make max function in hive query to ignore _HIVE_DEFAULT_PARTITION__

I have a view which uses max to show the latest partition (which is of format 2021-01, 2021-02, 2021-03, 2021-04). The hive table has _HIVE_DEFAULT_PARTITION__ too. When we run the query in Impala, max on partitions gives the correct value of…
1
vote
1 answer

Will HIVE do a full table query with both partition conditions and not partition conditions?

I have a hive table partitioned by one date column name datetime If I do a query like select * from table where datetime = "2021-05-01" and id in (1,2) with extra and id in (1,2) condition, will hive do a full table search? Is it possible to…
theotheo
  • 145
  • 2
  • 9
1
vote
1 answer

Hive Update partition vs MSCK Repair

I have a table with thousands of partition. I want to change all the partition location to diff cluster. Ex: for table test_table and partition day=2021041600 Old location: hdfs://cluster1/dir1/dir2/day=2021041600/\ New location:…
Sumit Singh
  • 15,743
  • 6
  • 59
  • 89