Hadoop partitioning deals with questions about how hadoop decides which key/value pairs are to be sent to which reducer (partition).
Questions tagged [hadoop-partitioning]
339 questions
0
votes
1 answer
Hadoop Map task/Map object
As per theory following properties are to define number of map/red task slots at data node.
mapred.tasktracker.map.tasks.maximum | mapred.map.tasks.
Also, number of mapper objects is decided by the number of input splits in the MapReduce job. We…

user3159843
- 1
- 1
0
votes
1 answer
HDInsight Azure Blob Storage Data Update
I am considering HDInsight with Hive and data loaded on Azure Blob Storage.
There is a combination of both historic and changing data.
Does the solution mentioned in Update , SET option in Hive work with blob storage too?
The below Hive statement…

Srinivas
- 2,479
- 8
- 47
- 69
0
votes
1 answer
hadoop mapreduce partitioner not invoked
I need help with mapreduce job, my custom partitioner is never invoked. I checked everything million times, but no result. It used to work a while ago, I have no idea why now it isn't.
Any help would be very appreicated.
I am adding the code (It…

Alexander Komarov
- 109
- 2
- 8
0
votes
3 answers
Provide map splits with splits of the same file
How can I provide each line of a file fed to the mapper with splits of the same file?
Basically what i want to do is
for each line in file-split
{
for each line in file{
//process
}
}
Can i do this using map reduce in…

Nitin J
- 78
- 1
- 2
- 9
0
votes
1 answer
using hive to select data within large range partitions
I've came out some problem using hive to select data within large range partitions
Here's the HQL I want to execute:
INSERT OVERWRITE TABLE summary_T partition(DateRange='20131222-20131228')
select col1, col2, col3 From RAW_TABLE
where cdate…

Dennis Shen
- 61
- 1
- 6
0
votes
1 answer
What two different keys go to the same reducer by the default hash partitioner in Hadoop?
As we know that Hadoop guarantees that the same keys which come from different mappers will be sent to the same reducer.
But if two different keys have the same hash value, they definitely will go to the same reducer, so will them be sent to the…

Judking
- 6,111
- 11
- 55
- 84
0
votes
2 answers
Partitioner or MultipleOutputs
I would like to have your opinion regarding Partitioner vs MultipleOutputs.
Suppose I have a file which contains keys as
0:aaa
1:bbb
0:ccc
0:ddd
...
1:zzz
I would like have 2 files: one file containing keys starting with 0: and the…

JohnRossy
- 63
- 5
0
votes
3 answers
how to group by data from hive with specific partition?
I have the following:
hive>show partitions TABLENAME
pt=2012.07.28.08
pt=2012.07.28.09 …

user2935539
- 73
- 2
- 6
0
votes
0 answers
Hadoop disk usage (intermediate reduce)
I' new in Hadoop,
I'm using a cluster and I have a disk quote of 15GB.
If I try to execute the wordcount sample on a big dataset (about 25GB) I receveid always the exception "The DiskSpace quota of xxxx is exceeded: ".
I checked my disk usage after…
0
votes
1 answer
Sending data from all mappers to all reducers
Before this question is flagged duplicate, please read through.
This has been asked many number of times with no clear answer. Lets say my task is to compute unigram probability for every word in millions of files. I can emit word counts from…

abhinavkulkarni
- 2,284
- 4
- 36
- 54
0
votes
2 answers
How does Hadoop decide to distribute among buckets/nodes?
I am new to Map/Reduce and Hadoop framework.
I am running a Hadoop program on single machine (for trying it out).
I have n input files and I want some summary of words from those files.
I know map function returns key value pair, but how map is…

Palash Kumar
- 429
- 6
- 18
0
votes
0 answers
How to best decide mapper output/reducer input for a huge string
I need to improve my MR jobs which uses HBase as source as well as sink..
Basically, i'm reading data from 3 HBase Tables in the mapper, writing them out as one huge string for the reducer to do some computation and dump into a HBase Table..…

Pavan
- 658
- 2
- 7
- 28
0
votes
1 answer
How input of small size is read by a mapper in map-reduce?
I have a map-reduce job whose input is a big data set (let's say of size 100GB). What this map-reduce job does is splitting the big data into chunks and writing separate files, one per each data chunk. That is, the output of the job is multiple…

HHH
- 6,085
- 20
- 92
- 164
0
votes
1 answer
How the input file gets split into chunks by the map-reduce framework?
I have an iterative mapreduce job in which, when a chunk, let's say Chunk i, is read by a mapper some information regarding the records within this chunk is stored in an auxiliary file, called F_i. In the next iteration (job), a different mapper…

HHH
- 6,085
- 20
- 92
- 164
0
votes
2 answers
can reduce task accept compressed data in hadoop
we see that map can accept and output compressed and uncompressed data. I was going through cloudera training and teacher mentioned that reduce task input has to be in form o key value and thus can't work on compressed data.
Is that right? If thats…

bruceparker
- 1,235
- 1
- 17
- 33