Hadoop partitioning deals with questions about how hadoop decides which key/value pairs are to be sent to which reducer (partition).
Questions tagged [hadoop-partitioning]
339 questions
1
vote
1 answer
How to add one extra partition to external hive table?
I have hive table like below
create external table transaction(
id int,
name varchar(60))
month string
)
PARTITIONED BY (
year string,
transaction_type_code varchar(20)
)
STORED AS PARQUET
LOCATION 'hdfs://xyz';
I am…

Manoj Kumar Dhakad
- 1,862
- 1
- 12
- 26
1
vote
1 answer
How to drop rows from partitioned hive table?
I need to drop specific rows from a Hive table, which is partitioned. These rows for deletion matches certain conditions, so entire partitions can not be dropped in order to do so. Lets say the table Table has three columns: partner, date and…

Dr Potato
- 168
- 1
- 15
1
vote
1 answer
How partitioning and clustered by works in Hive table?
I'm trying to understand below query by using that how data is going to be placed.
CREATE TABLE mytable (
name string,
city string,
employee_id int )
PARTITIONED BY (year STRING, month STRING, day STRING)
CLUSTERED BY…

nut
- 51
- 7
1
vote
1 answer
hive script failing due to heap space issue to process too many partitions
my script failing due to a heap space issue to process too many partitions. To avoid the issue I am trying to insert all the partitions into a single partition but I am facing the below error
FAILED: SemanticException [Error 10044]: Line 1:23 Cannot…

Never_Give_Up
- 126
- 1
- 9
1
vote
1 answer
joining hive partitioned , bucketed table with only bucketed table (not partitioned table) in hive
i have 2 tables:
q6_cms_list_key1 (bucketed by cm and se) partitioned by tr_dt ... 99 000 000 000 rows
q6_cm_first_visit (bucketed by cm and se) 25 000 000 000 rows
making another table using below conditions
insert into table…

vashi
- 9
- 2
1
vote
1 answer
Hive: why to use partition by in selects?
I cannot understand partitioning concept in Hive completely.
I understand what are partitions and how to create them. What I cannot get is why people are writing select statements which have "partition by" clause like it is done here: SQL most…

MiamiBeach
- 3,261
- 6
- 28
- 54
1
vote
1 answer
Can I create buckets in a Hive External Table?
I am creating an external table that refers to ORC files in an HDFS location. That ORC files are stored in such a way that the external table is partitioned by date (Mapping to date wise folders on HDFS, as partitions).
However, I am wondering if I…

Sai Geetha M N
- 33
- 10
1
vote
1 answer
How to insert Hive partition column and value into data (parquet) file?
Request:- How can I insert partition key pair into each parquet file while inserting data into Hive/Impala table.
Hive Table DDL
[
create external table db.tbl_name ( col1 string, col2 string)
Partitioned BY (date_col string)
STORED AS…

Peace_Dude
- 11
- 3
1
vote
1 answer
Drop partitions in Hive with different date format in the same partition column
I have 2 types of value in the partition column of string datatype:
yyyyMMdd
yyyy-MM-dd
E.g. there are partition column values 20200301, 2020-03-05, 2020-05-07, 20200701, etc.
I need to drop partitions less than 20200501 with a DDL statement…

Keerthana Somu
- 13
- 2
1
vote
1 answer
HDFS:Exact meaning of dfs.block.size
In our cluster the dfs.block.size is configured 128M, but I have seen quite a few files which is of the size of 68.8M which is a weird size. I have been confused on how exactly this configuration option affects how files look like on HDFS.
First…

Boyu Zhang
- 219
- 2
- 12
1
vote
1 answer
How to call Partitioner in Haoop v 0.21
In my application I want to create as many reducer jobs as possible based on the keys. Now my current implementation writes all the keys and values in a single (reducer) output file. So to solve this, I have used one partitioner but I cannot call…

Kal
- 161
- 3
- 14
1
vote
1 answer
Reducer Selection in Hive
I have following record set to process like
1000, 1001, 1002 to 1999,
2000, 2001, 2002 to 2999,
3000, 3001, 3002 to 3999
And I want to process the following record set using HIVE in such a way so that reducer-1 will process data 1000 to 1999…

Suvo
- 19
- 1
1
vote
1 answer
Hive Date Partitioned table - Streaming Data in S3 with mixed dates
I have extensive experience working with Hive Partitioned tables. I use Hive 2.X. I was interviewing for a Big Data Solution Architect role and I was asked the below question.
Question: How would you ingest a streaming data in a Hive table…

ZeroDecibels
- 115
- 1
- 5
1
vote
0 answers
How to accelerate large hive table spark group by query?
I have an input table intab:
create table intab (
ds string comment 'date partition filed'
, id1 string comment 'id1'
, id2 string comment 'id2'
, n int comment 'n'
) comment 'test'
partition by list(ds)(partition default);
I need to…

Changwang Zhang
- 2,467
- 7
- 38
- 64
1
vote
0 answers
Spark Save job is taking a long time
I am trying to save the Dataframe to HDFS location. But my save is taking a long time. The action before this is joining two tables using Spark SQL. Need to know why the save is having four stages and how to improve the performance. I have attached…

Talib aman
- 19
- 2