Hadoop partitioning deals with questions about how hadoop decides which key/value pairs are to be sent to which reducer (partition).
Questions tagged [hadoop-partitioning]
339 questions
0
votes
1 answer
Gathering multiple mapper's result sorted at Reducer in Hadoop
I have multiple very large files(nearly 500MB) as input to my MR program. I divide(split) these files into equal size partitions. Each Mapper gets
single partition of a file
Mapper : Key=(filename, partition_number) and Value= (character stream of…

Sumit
- 27
- 8
0
votes
1 answer
Can a slave node have multiple blocks of the same file in hadoop?
Say I have a hadoop cluster where one node is the Master node and the other is a Data node. The slave node is an 8-core machine just to make sure there are enough cores to process jobs parallelly. Can i still split the file into say 3 blocks and…

Sheel Pancholi
- 621
- 11
- 25
0
votes
1 answer
TotalOrderPartion with ChainMapper
I have a ChainMapper with 2 mappers associated to it. I am trying to perform a TotalOrderPartition on the last mapper in the chain with out much of a success.
Is there a way to enforce partitioning based on some sampling on the Nth mapper in the…

bitan
- 444
- 4
- 14
0
votes
1 answer
How to sort a column in data set in descending order using Java Hadoop map reduce?
My data file is:
Utsav Chatterjee Dangerous Soccer Coldplay 4
Rodney Purtle Awesome Football Maroon5 3
Michael Gross Amazing Basketball Iron Maiden 6
Emmanuel Ezeigwe Cool Pool Metallica 5
John Doe Boring Golf …

Utsav Chatterjee
- 181
- 3
- 14
0
votes
1 answer
creating custom key value for mappers in hadoop from file
I have a file of size 50MB(complete text data without spaces). I want to partition this data in such a way that each mapper should get 5MB data. Mapper should get data in (K,V) format where key - partition Number(like 1,2,..) and Value is the plain…

Sumit
- 27
- 8
0
votes
0 answers
Hadoop Streaming: How to parition output into subfolders?
To be specific, for example, given
hadoop jar hadoop-streaming.jar \
-input myInputDirs \
-output myOutputDir \
-mapper /bin/cat \
-reducer /usr/bin/wc
Where myInputDirs has a dated subfolder structure of
input_dir/yyyy/mm/dd/part-*
I…

Osiris
- 1,007
- 4
- 17
- 30
0
votes
1 answer
How to make an UNION in HIVE over two EXTERNAL TABLES which point to the same file
I'm trying to write a Hive script which creates two External tables, both of them pointing to the same file LOCATION with differents regular expressions (filters). When I try to make an UNION between them, results aren't as expected.
The first…

marcos
- 21
- 3
0
votes
1 answer
Why is `getNumPartitions()` not giving me the correct number of partitions specified by `repartition`?
I have a textFile in and RDD like so: sc.textFile().
I try to repartition the RDD in order to speed up processing:
sc.repartition().
No matter what I put in for , it does not seem to change, as indicated by:
RDD.getNumPartitions()…

makansij
- 9,303
- 37
- 105
- 183
0
votes
1 answer
HashPartition in MapReduce
Objective :
Implement HashPartition and check the no of reducers that are getting created automatically.
Any help and any sample code is always appreciated for this purpose.
What I did :
I ran a map reduce program with Hash Partition implemented…

Ritab
- 37
- 6
0
votes
1 answer
How to deal with .gz input files with Hadoop?
Please allow me to provide a scenario:
hadoop jar test.jar Test inputFileFolder outputFileFolder
where
test.jar sorts info by key, time, and place
inputFileFolder contains multiple .gz files, each .gz file is about 10GB
outputFileFolder…

frankilee
- 77
- 1
- 7
0
votes
3 answers
Insert partitioned data into partitioned hive table
I have stored the data in hdfs using Pig Multistorage with the column id.
So data stored as
/output/1/part-0000
/output/2/
/output/3/
Now I have created a partitioned table in hive and I want to load the data from /output folder into this…

wazza
- 770
- 5
- 17
- 42
0
votes
1 answer
HIVE: Empty buckets getting created after partitioning in HDFS
I was trying to create Partition and buckets using HIVE.
For setting some of the properties:
set hive.enforce.bucketing = true;
SET hive.exec.dynamic.partition = true;
SET hive.exec.dynamic.partition.mode = nonstrict;
Below is the code for creating…

user182944
- 7,897
- 33
- 108
- 174
0
votes
0 answers
Hadoop KeyComposite and Combiner
I am doing a secondary sort in Hadoop 2.6.0, I am following this tutorial:
https://vangjee.wordpress.com/2012/03/20/secondary-sorting-aka-sorting-values-in-hadoops-mapreduce-programming-paradigm/
I have the exact same code, but now I am trying to…

ie8888
- 171
- 1
- 10
0
votes
3 answers
Split input to a reducer in hadoop
This question is kind of related to my other question Hadoop handling data skew in reducer.
However, I would like to ask if there are some configuration settings available so that if say the max reducer memory is reached then spawn off a new reducer…

sunny
- 824
- 1
- 14
- 36
0
votes
2 answers
Hadoop handling data skew in reducer
Am trying to determine if there are certain hooks available in the hadoop api (hadoop 2.0.0 mrv1) to handle data skew for a reducer.
Scenario : Have a custom Composite key and partitioner in place to route data to reducers. In order to deal with the…

sunny
- 824
- 1
- 14
- 36