Hadoop partitioning deals with questions about how hadoop decides which key/value pairs are to be sent to which reducer (partition).
Questions tagged [hadoop-partitioning]
339 questions
1
vote
1 answer
Hadoop streaming KeyFieldBasedPartitioner
I am extracting data from freebase dump (title, aliases, type names) into avro (not yet in this job). I am using mapreduce streaming with python.
This job reducer expects type title (which is generally any object title) and type id reference to…

Ondrej Galbavý
- 159
- 1
- 13
1
vote
1 answer
Hadoop Datanode configuration Cores and RAM
I am using Hadoop cluster with 9 nodes. I would like to know what is the basic datanode configuration in Hadoop cluster.
I am using following configuration on Namenode and Datanode.
RAM = 4GB
Cores = 4
Disk = 8 ( Total 16GB storage…

navaz
- 125
- 1
- 2
- 15
1
vote
0 answers
reducer always fails and map succeeds
I am running simple wordcount job on 1GB of text file . My cluster has 8 Datanodes and 1 namenode each has a storage capacity of 3GB.
When i run wordcount I can see map always succeeds and reducer is throwing an error and fails. Please find below…

navaz
- 125
- 1
- 2
- 15
1
vote
0 answers
Can we read built-in counters in Hadoop for individual tasks
Can we read built-in counters in Hadoop for individual tasks and in a periodic manner (say every 500 ms or 1 sec) and record in a file. If we can do that then how to do that?
How to get the individual task pids?
1
vote
2 answers
Input split for Map function in Hadoop
This is my first implementation in Hadoop. I am trying to implement my algorithm for probabilistic dataset in Map Reduce. In my dataset, last column will have some id(number of unique id's in the dataset is equal to the number of nodes in my…

ds_user
- 2,139
- 4
- 36
- 71
1
vote
1 answer
How to solve the chainmapper is not applicable for the arguments error while doing job chaining in Mapreduce?
I'm using Hadoop 1.2.1, eclipse juno. I'm trying to chaining three map task in a single Mapreduce job. while writing Mapreduce code in eclipse, I'm getting error like chainmapper is not applicable for the arguments and also I cant set inputpath.…

Karthick
- 97
- 1
- 1
- 7
1
vote
2 answers
Output of reducer sent to HDFS where as map output is stored in data node local disk?
I am bit confused about HDFS storage and Data node storage. Below are my doubts.
Map function output will be saved to data node local disk and reducer output will be sent to HDFS. As we all know that data blocks are stored in data nodes local disk…

Suresh Babu D.V
- 11
- 1
- 2
1
vote
1 answer
Files through map function in map reduce
Is it possible to somehow pass a set of files through each map function. The requirement will be to process each file in parallel for different-2 operations. I am completely new to map reduce and i am using JAVA as my programming language.

anuj pradhan
- 2,777
- 4
- 26
- 31
1
vote
1 answer
MapReduce streaming job with -libjars, custom partitioner fails: "class not found"
I am trying to attach a custom (java) partitioner to my MapReduce streaming job. I am using this command:
../bin/hadoop jar ../contrib/streaming/hadoop-streaming-1.2.1.jar \
-libjars ./NumericPartitioner.jar -D mapred.map.tasks=12 -D…

SoItBegins
- 414
- 1
- 6
- 22
1
vote
1 answer
Splits in hadoop with variable-length/non-delimited binary file
I've just started working on a hadoop based ingester for open street map data. There are a few formats - but I've been targeting a protocolbuffer based format (note - it's not pure pb).
It's looking to me like it would be more efficient to…

Chris B
- 926
- 7
- 16
1
vote
2 answers
Handle uneven distribution of values across keys in Hadoop mapreduce
I am dealing with a input log files in hadoop where the keys are not evenly distributed. This means that the reducers have uneven distribution of values. For example key1 has 1 value and key2 has 1000 value.
Is there any way to do the load…

udag
- 41
- 1
- 6
1
vote
1 answer
Hadoop webuser: No such user
While running a hadoop multi-node cluster , i got below error message on my master logs , can some advise what to do..? do i need to create a new user or can i gave my existing Machine user name over here
2013-07-25 19:41:11,765 WARN
…

Surya
- 3,408
- 5
- 27
- 35
1
vote
1 answer
How to partition large Hive table with many categories
I want to partition my table in hive so that for every unique item in the row it creates a partition. There are ~250 partitions for about a 4 billion row table so I would like to to something like a for loop or a distinct. Here is my thoughts in…

user1807096
- 25
- 1
- 2
- 7
1
vote
2 answers
Custom Partitioner in Hadoop
I have some data that is keyed by ids in the range of 0 to 200-something million and I need to split it up into bucks for ranges like 0-5mil, 5mil - 10mil, etc.
I'm attempting to use a custom partitioner on Hadoop for this final part so that the…

sbilstein
- 307
- 3
- 14
1
vote
2 answers
hadoop - how total mappers are determined
I am new to hadoop and just installed oracle's virtualbox and hortonworks' sandbox. I then, downloaded the latest version of hadoop and imported the jar files into my java program. I copied a sample wordcount program and created a new jar file. I…

Ramesh
- 765
- 7
- 24
- 52