Questions tagged [hive-partitions]

To be used for questions regarding partitions in hive.

Partitioning is a way of dividing a table into related parts based on the values of partitioned columns such as date, city, and department. Using partition, it is easy to query a portion of the data.

Partitions are essentially horizontal slices of data which allow larger sets of data to be separated into more manageable chunks. In Hive, partitioning is supported for both managed and external tables in the table definition as seen below.

144 questions

vote

1 answer

What hashing algorithm does Hive use for partitioning?

I need to understand the algorithm used by Hive to hash partition data. For example, Spark uses Murmur Hashing. Any ideas or resources?

asked Jun 02 '20 at 23:16

Ananya Gupta

vote

2 answers

Hive | Create partition on a date

I need to create an external hive table on top of a csv file. CSV is having col1, col2, col3 and col4. But my external hive table should be partitioned on month but my csv file doesn't have any month field. col1 is date field. How can I do this?

hive hiveql hive-partitions

asked May 11 '20 at 08:10

user13516187

vote

1 answer

How can we drop a HIVE table with its underlying file structure, without corrupting another table under the same path?

Assuming we have 2 hive tables created under the same HDFS file path. I want to be able to drop a table WITH the HDFS files path, without corrupting the other table that's in the same shared path. By doing the following: drop table…

hadoop hive hdfs hive-partitions

asked May 04 '20 at 17:50

GeoSal

vote

1 answer

msck repair a big table take very long time

I have a daily ingestion of data into HDFS . From data into HDFS I generate Hive tables partitioned by date and another column. One day has 130G data. After generate the data, I run msck repair. Now every msck tasks more than 2 hours. In my mind,…

hive hdfs bigtable hive-partitions

asked Apr 29 '20 at 10:06

Gary Wang

vote

1 answer

Hive partition column

We have avro partitioned table in hive. When we query table, partition column is displaying at the end. Is there any way to display partition column at first? Eg: select * from tablea Output: Col1 col2 partition_column Expected…

hive hiveql partition hive-partitions

asked Mar 03 '20 at 15:08

user11069271

vote

1 answer

Hive table deduplication across multiple partitions

I am trying to de duplicate a table that may have duplicates across partitions. For example id device_id os country unix_time app_id dt 2 2 3a UK 7 5 2019-12-22 1 2 3a USA 4 5 …

sql hive duplicates hiveql hive-partitions

asked Dec 23 '19 at 11:04

George Annan

vote

0 answers

How to create partitioned and bucked external table in hive with delta directories?

I created a partitioned and bucketed table in HIVE by merging many files. Due to some reasons, that table cannot be accessed from HIVE, maybe its metadata is lost, though the data is there along with partitions,delta directories and buckets. I have…

azure hive hdfs external-tables hive-partitions

asked Nov 04 '19 at 05:30

Ayaz49

vote

2 answers

get number of partitions in pyspark

I select all from a table and create a dataframe (df) out of it using Pyspark. Which is partitioned as: partitionBy('date', 't', 's', 'p') now I want to get number of partitions through using df.rdd.getNumPartitions() but it returns a much…

dataframe pyspark rdd hive-partitions

asked Oct 19 '19 at 19:03

Alan

vote

0 answers

Pyspark: insert dataframe into partitioned hive table

Apologies if I'm being really basic here but I need a little Pyspark help trying to dynamically overwrite partitions in a hive table. Tables are drastically simplified, but the issue I'm struggling with is (I hope) clear. I'm pretty new to PySpark…

hive pyspark hive-partitions

asked Oct 14 '19 at 14:39

Amit

vote

1 answer

bash - grabbing the partitions of a hive table using grep and regex

I am trying to get the partition column names of a hive table in bash using grep and regex. I am trying this: hive -e 'show create table employees' | grep -E 'PARTITIONED BY (.*)' This is giving me the result like: PARTITIONED BY ( How do I have…

regex bash hive grep hive-partitions

asked Oct 10 '19 at 03:30

Hemanth

vote

1 answer

Unable to create Hive unique paritions

I am unable to create unique partitions. when i am uploading data, it's creating all the dates as partition again and again, even the dates are same create table product_order1(id int,user_id int,amount int,product string, city string, txn_date…

date hadoop hive hive-partitions

asked Oct 06 '19 at 07:03

Priyanka

vote

1 answer

Performance of Group By on Partition Column in Hive

I have a table with 4 columns with col4 as the partition column in Hive. This is a huge table with ~9M rows inserted every 5 hours. I have a restriction that I cannot change the design of this table as it is used for other reports as well. CREATE…

hadoop hive cloudera hive-partitions

asked Sep 10 '19 at 12:07

underwood

vote

2 answers

Deletion of Partitions

I am not able to drop partition in hive table. ALTER TABLE db.table drop if exists partition(dt="****-**-**/id=**********"); OK Time taken: 0.564 seconds But partitions are not getting deleted Below is the what I get when I check partitions of my…

hive hive-partitions hiveddl

asked Sep 04 '19 at 06:53

Kamal Prasad

vote

2 answers

INSERT OVERWRITE PARTITION () checks if partition exists

I want to check if a certain partition already exists before "insert overwrite" it. Only need to insert when that partition does not exist. How to modify this query? INSERT OVERWRITE TABLE myname.mytable PARTITION (ds='2019-07-19')

sql hive hive-partitions

asked Jul 19 '19 at 20:52

daydayup

2,049
5
22
47

vote

1 answer

How to undo ALTER TABLE ... ADD PARTITION without deleting data

Let's suppose I have two hive tables, table_1 and table_2. I use: ALTER TABLE table_2 ADD PARTITION (col=val) LOCATION [table_1_location] Now, table_2 will have the data in table_1 at the partition where col = val. What I want to do is reverse this…

apache-spark hive partition hive-partitions hiveddl

asked Jun 25 '19 at 05:23

allen kim

1,705
2
14
13

Prev 1 2 3

…

9 10 Next