Hadoop partitioning deals with questions about how hadoop decides which key/value pairs are to be sent to which reducer (partition).
Questions tagged [hadoop-partitioning]
339 questions
0
votes
1 answer
Hadoop Total order Partitioning
Why total total order partitioning in hadoop?. Which scenario we need to take total order partitioning ?. My understanding is after multiple reducers, each reducer result will be sorted by key . then why we need to do total order partitioning. Would…

Learn Hadoop
- 2,760
- 8
- 28
- 60
0
votes
1 answer
custom partitioner in Hadoop error java.lang.NoSuchMethodException:- ()
I am trying to make a custom partitioner to allocate each unique key to a single reducer. this was after the default HashPartioner failed
Alternative to the default hashpartioner provided with hadoop
I keep getting the following error. It has…

zaranaid
- 65
- 1
- 13
0
votes
0 answers
Isn't the term shuffling in MapReduce misleading?
I think the term shuffle refers to randomly reordering elements in a sequence [1]. Therefore, the first time I saw shuffling in MapReduce, I thought it's trying to uniformly distribute workload to nodes for load balancing purpose. However, after…

Lingxi
- 14,579
- 2
- 37
- 93
0
votes
0 answers
How handle uneven partition in spark
So i have a data frame which output 300 GB records into S3 .
All the data is partitioned in 2K partition.
.partitionBy("DataPartition", "PartitionYear", "PartitionStatement")
But the issue is some of the partition has very huge data (40GB) and…

Sudarshan kumar
- 1,503
- 4
- 36
- 83
0
votes
1 answer
how to constraint hive query file output to be in a single file always
I have created a hive table using below query, and inserting data to this table on daily basis using second query as mentioned below
create EXTERNAL table IF NOT EXISTS DB.efficacy
(
product string,
TP_Silent INT,
TP_Active INT,
server_date…

Ashish Mittal
- 643
- 3
- 12
- 32
0
votes
2 answers
Multiple reducers without running partitioner in MapReducer
I am trying to understand the concept of running multiple reducers in MR job and came to know that it is partitioner which decides which (key,value) pairs goes to which reducer.
My question is:
Can we run multiple reducers without running…

CuriousMind
- 8,301
- 22
- 65
- 134
0
votes
1 answer
One file per partition (Coalesce per pertition) while inserting data into hive table
I have a table created in hive stored in s3 location.
It has about 10 columns and is partitioned on 3 columns month, year and city , in the same order.
I am running a spark job that creates a dataframe(2 billion rows) and writes into this table.
val…

dreddy
- 463
- 1
- 7
- 21
0
votes
0 answers
How to get recently created partitions in a Hive table?
I have a table called EMPLOYEE with columns ID, NAME, DESIGNATION, CITY, COUNTRY, CONTINENT. With 3 level partitioning on CONTINENT, COUNTRY, CITY.
Now I need to know recently created partitions say after a specific timestamp.
Note : Assuming access…

Shashank V C
- 153
- 1
- 1
- 9
0
votes
1 answer
Can I get access to a full block in the mapper?
Usually record reader passes line by line to the mapper or n lines. Can full block be accessed in the mapper? This means the record reader will give full block to the mapper instead of line by line? Does this approach makes sense?
Thanks

shujaat
- 279
- 6
- 17
0
votes
1 answer
How to merge hive partitoned data in one large file?
I have hive table partitioned on date and hour column. when I load the data i will create 24 files. I want merge this 24files in one file. Can anyone suggest me the solution

user3890017
- 21
- 1
- 3
0
votes
0 answers
handle hive partitioned table running beyond physical memory limits
I have to create a hive partitioned table but it is running beyond physical memory limits.
Is there a way to handle this issue . I cant change container memory neither use less data from source.
Will temporary tables help here i.e. creating 12…

DrSD
- 151
- 2
- 12
0
votes
1 answer
What happends at backend when we alter a table in hive
When we alter a table in hive like changing the partition. what happens to the table. Does it reformat the table or it creates a new data for new partition?

Monika Samant
- 47
- 7
0
votes
0 answers
How to do a partition on Hive table column of every thousand rows
In a Hive table I have millions of rows, I would like to do a partition on one column 'id' which will be unique. So it is not a good practice to create a partition on that unique column because it will create so many number of files, and…

Nomad
- 751
- 4
- 13
- 34
0
votes
1 answer
File Storage in Hadoop
In Hadoop, suppose I have a file A.txt and in that I have some sample data say:
Hello how are you? I am studying hadoop partitioning. Hadoop is interesting to learn and has good opportunities etc...
How does this data gets stored in blocks? As…

jack0989
- 39
- 2
0
votes
0 answers
Join - Pig scripts
Iam new to Pig scripts. I need help in joining 'B' and 'E'. Below is my script.
A = LOAD ....
PAPS_1 = FILTER A BY (dataMap#'corr_id_' is NOT null);
B = FOREACH PAPS_1 GENERATE dataMap#'corr_id_' as id, dataMap#'response' as resp,…

Anjanaa
- 31
- 1
- 7