Questions tagged [hive-partitions]

To be used for questions regarding partitions in hive.

Partitioning is a way of dividing a table into related parts based on the values of partitioned columns such as date, city, and department. Using partition, it is easy to query a portion of the data.

Partitions are essentially horizontal slices of data which allow larger sets of data to be separated into more manageable chunks. In Hive, partitioning is supported for both managed and external tables in the table definition as seen below.

144 questions
1
vote
1 answer

How partitioning and clustered by works in Hive table?

I'm trying to understand below query by using that how data is going to be placed. CREATE TABLE mytable ( name string, city string, employee_id int ) PARTITIONED BY (year STRING, month STRING, day STRING) CLUSTERED BY…
nut
  • 51
  • 7
1
vote
1 answer

Can I use regular expression in PARTITION BY?

( ResponseRgBasketId STRING, RawStandardisedLoadDateTime TIMESTAMP, InfoMartLoadDateTime TIMESTAMP, Operaame STRING, RequestTimestamp TIMESTAMP, RequestSiteId …
1
vote
1 answer

Dynamic partitioned table in hive not updating the recent partitions

I have a folder in gcs bucket with a folder structure as Xyz/Abc/dt=03-12-2021/file_03-12-2021.csv Xyz/Abc/dt=04-12-2021/file_04-12-2021.csv I am trying to create a dynamic partitioned table on top of the folder by executing the below queries Set…
1
vote
1 answer

What happens if I move Hive table data files before moving the table?

I am trying to move the location of a table to a new directory. Let's say the original location is /data/dir. For example, I am trying something like this: hadoop fs -mkdir /data/dir_bkp hadoop fs -mv /data/dir/* /data/dir_bkp I then do hive…
formicaman
  • 1,317
  • 3
  • 16
  • 32
1
vote
1 answer

into Hive table - Non Partitioned table to Partitioned table having multiple partitions - Cannot insert into target table because column number/types

When I tried to insert into a Partiotioned table I am getting the bellow error: SemanticException [Error 10044]: Line 1:23 Cannot insert into target table because column number/types are different : Table insclause-0 has 6 columns, and the 3 columns…
1
vote
0 answers

Rationale behind partition specific schema in Hive/Glue tables

I'm trying to understand the rationale behind the partition specific schema managed for Hive/Glue tables. Albeit, I couldn't find any documentation, specifically talking about this but during my search, I found a couple of Hive JIRAs (as attached in…
Dilraj Singh
  • 131
  • 1
  • 8
1
vote
1 answer

Hive: why to use partition by in selects?

I cannot understand partitioning concept in Hive completely. I understand what are partitions and how to create them. What I cannot get is why people are writing select statements which have "partition by" clause like it is done here: SQL most…
MiamiBeach
  • 3,261
  • 6
  • 28
  • 54
1
vote
1 answer

Hive External Table - Drop Partition

Facing a weird issue. Alter table command to drop partition works well for > or < or >= or <= signs but not for = check. Working command: ALTER TABLE XYZ DROP PARTITION(bizdate>'20231230'); Command that's not working and throwing an error stating…
Dasarathy D R
  • 335
  • 2
  • 7
  • 20
1
vote
1 answer

copy data from one table to another partitioning table

%hive INSERT INTO NEWPARTITIONING partition(year(L_SHIPDATE)) select * from LINEITEM; I want to copy the data from line item to the partitioning table NEWPARTITIONING but I got the following error: line 1:54 cannot recognize input near ')'…
Rex
  • 11
  • 1
1
vote
1 answer

Is there a workaround to my attempted Hive insert

I copy the structure of schema2.card_master over to schema1.card_master using hive> create table schema1.card_master like schema2.card_master; That works, and it is partitioned as was the original on a field. This new table has hundreds of fields…
Mark Ginsburg
  • 2,139
  • 4
  • 17
  • 31
1
vote
1 answer

Hive Partitioned Table - trying to load data from one table to a partitioned table in my Hive and getting [Error 10044]*

So I have a table with 20 columns and i created another partitioned table - using 2 partition values, now when i try to load data from my table with 20 columns into the other partitioned tables i get error saying my partitioned table has more…
Ruth Ineh
  • 43
  • 1
  • 10
1
vote
1 answer

Hive - Is it mandatory to have '=' for external table to consider as partition

I am new to Hive and have a below basic question: I am trying to create external table on HDFS directory at location /projects/score/output/scores_2020-06-30.gzip but it is not considering it as partition. Should developer need to change…
ShaanGreat
  • 23
  • 6
1
vote
1 answer

Hive count elements of the max partition column

I'm struggling with a query that may look simple but which is causing me a lot of trouble. SELECT COUNT(*) FROM mytable where partition_column IN (SELECT MAX(partition_column) FROM mytable ) mytable is a 2To Hive External table, partitioned by the…
Eric C
  • 165
  • 12
1
vote
1 answer

Does msck repair trigger table statistics generation

i would like to know whether running msck repair on table will generate table statistics for CBO when: hive.cbo.enable=true hive.stats.autogather=true or do i have to use analyze compute statistics. Thanks
Yaniv Irony
  • 149
  • 4
1
vote
0 answers

Error while importing huge table from Oracle into HDFS: 'org.apache.spark.shuffle.FetchFailedException: Read error or truncated source'

I have been trying to replicate a couple of huge tables from an Oracle DB into HDFS, I use PySpark and JDBC to read the tables from the source and I save the tables as Hive partitioned tables. I have managed to replicate and save these tables to…