Questions tagged [hive-partitions]

To be used for questions regarding partitions in hive.

Partitioning is a way of dividing a table into related parts based on the values of partitioned columns such as date, city, and department. Using partition, it is easy to query a portion of the data.

Partitions are essentially horizontal slices of data which allow larger sets of data to be separated into more manageable chunks. In Hive, partitioning is supported for both managed and external tables in the table definition as seen below.

144 questions
0
votes
2 answers

Choosing Partition Column

I have huge data set related to transactions. I need to choose partitioning column from transaction_date(increases everyday) or state(limited number). which is the ideal choice and why?
Mugdha
  • 112
  • 9
0
votes
1 answer

Hive partitioned view not showing partitions info

I have created a partitioned view in Hive as below create view if not exists view_name PARTITIONED ON(date) as select col1,col2,date from table1 union all select col1,col2,date from table2 The underlying tables are partitioned on 'date' column.…
0
votes
1 answer

Can hive metastore virtually partition data based on column value without physically changing the directory structure?

As an example consider I have a data of all the major sports events happened.Schema given below EventName,Date,Month,Year,City This data that is physically structured in HDFS on year,date,month. Now I want to create virtual partitions on that based…
0
votes
1 answer

PySpark - data overwritten in Partition

I am seeing a situation where when save a pyspark dataframe to a hive table with multiple column partition, it overwrites the data in subpartition too. Or - may be I am assuming it is a subpartition. I want to treat the column 'month' as…
0
votes
2 answers

Can i move data from one hive partition to another partition of the same table

My partition is based on year/month/date. Using SimpleDateFormat for week year created a wrong partition . The data for the date 2017-31-12 was moved to 2018-31-12 using YYYY in the date format. SimpleDateFormat sdf = new…
Aditya Goel
  • 13
  • 2
  • 6
0
votes
2 answers

Can Hive load data from external location which is not on HDFS?

I am trying to understand that for the external table in Hive, can we have the location outside of HDFS, I mean is that I want to create my external table on top of Google storage location (gs://bucket-name/table-partitions).
user7763294
0
votes
2 answers

hive setting hive.optimize.sort.dynamic.partition

I am trying to insert into a hive table with dynamic partitions. The same query has been running fine for last few days, but is giving the below error now. Diagnostic Messages for this Task:…
Jaikumar Obla
  • 31
  • 2
  • 5
0
votes
1 answer

Hive Runtime Error: Unable to deserialize reduce input key

I am trying to run a Insert in to partition table with group by involved query 'set hive.exec.dynamic.partition=true; set hive.exec.dynamic.partition.mode=nonstrict; set hive.execution.engine=tez; INSERT OVERWRITE TABLE table1 PARTITION (date)…
dheee
  • 1,588
  • 3
  • 15
  • 25
-2
votes
1 answer

How to duplicate Hive partition table

I have a table with date column and partition is made on that date column in hive . Say as of now 300 part files are there and each day only one record will insert then my table contains 300 records . Now I want to create a duplicate table with…
Chinna
  • 1
1 2 3
9
10