Hadoop partitioning deals with questions about how hadoop decides which key/value pairs are to be sent to which reducer (partition).
Questions tagged [hadoop-partitioning]
339 questions
0
votes
1 answer
hadoop partitioner getting incorrect reduce count
I'm working on partitioner today. Its the basic program in hadoop custom partitioners. Below is my partitioner code snippet.
public class VowelConsPartitioner extends Partitioner {
@Override
public int getPartition(Text letterType, IntWritable…

Santosh Batta
- 1
- 1
0
votes
2 answers
Get max salary employee name using hadoop map reduce
i am very new to M/R programs..i have a file in HDFS with data in this structure
EmpId,EmpName,Dept,Salary,
1231,userName1,Dept1,5000
1232,userName2,Dept2,6000
1233,userName3,Dept3,7000
.
. …

user1585111
- 1,019
- 6
- 19
- 35
0
votes
1 answer
DiskErrorException on slave machine - Hadoop multinode
I am trying to process XML files from hadoop, i got following error on invoking word-count job on XML files .
13/07/25 12:39:57 INFO mapred.JobClient: Task Id : attempt_201307251234_0001_m_000008_0, Status : FAILED
Too many fetch-failures
13/07/25…

Surya
- 3,408
- 5
- 27
- 35
0
votes
1 answer
Error on starting HDFS daemons on hadoop Multinode cluster
Issue While Hadoop multi-node set-up .As soon as i start My hdfs demon on Master (bin/start-dfs.sh)
i did got below logs on Master
starting namenode, logging to…

Surya
- 3,408
- 5
- 27
- 35
0
votes
1 answer
Hadoop command line explanation
Can some one explain me this syntax ,
bin/hadoop jar hadoop*examples*.jar wordcount /user/hpuser/testHadoop /user/hpuser/testHadoop-output
Why are we using jar soon after bin/hadoop
What does hadoop*examples*.jar means..?
Do wordcount is name of…

Surya
- 3,408
- 5
- 27
- 35
0
votes
2 answers
Creating more partitions than reducers
When developing locally on my single machine, I believe the default number of reducers is 6. In a particular MR step, I actually divide up the data into n partitions where n can be greater than 6. From what I have observed, it looks like only 6 of…

syker
- 10,912
- 16
- 56
- 68
0
votes
1 answer
Generating multiple equally sized output files in Hadoop
What are some methods for finding X data ranges in Hadoop so that one can use these ranges as partitions in the reducer step?

syker
- 10,912
- 16
- 56
- 68
0
votes
1 answer
Hadoop file system is physical file system or virtual file system
Hadoop file system is physical file system or virtual file system

user2183044
- 33
- 2
- 6
0
votes
2 answers
hadoop distribute partitions to reducer
For load balancing reasons, I want to create more partitions than reducers in a Hadoop environment. Is there a way to assign partitions to a specific reducers and if so, where can I define them. I wrote a individual Partitioner and want now to…

beto8888
- 45
- 1
- 4
0
votes
3 answers
how to work on specific part of cvs file uploaded into HDFS?
how to work on specific part of cvs file uploaded into HDFS ?
I'm new in Hadoop and i have an a question that is if i export an a relational database into cvs file then uploaded it into HDFS . so how to work on specific part (table) in file using…

Samy Louize Hanna
- 821
- 2
- 8
- 15
0
votes
2 answers
How to use hadoop MapReuce framework for an Opencl application?
I am developing an application in opencl whose basic objective is to implement a data mining algorithm on GPU platform. I want to use Hadoop Distributed File System and want to execute the application on multiple nodes. I am using MapReduce…

sandeep.ganage
- 1,409
- 2
- 21
- 47
0
votes
1 answer
How to increase hadoop map tasks by implementing getSplits
I want to process multiline CSV files and for that I wrote a custom CSVInputFormat.
I would like to have about 40 threads processing CSV lines on each hadoop node. However, when I create a cluster on Amazon EMR with 5 machines (1 master and 4…

mvallebr
- 2,388
- 21
- 36
0
votes
1 answer
how to Load key-value data into hbase tables?
Thanks for taking interest in my question. Before I begin, I'd like to let you know that I'm very new to Hadoop & HBase. So far, I find Hadoop very interesting and would like to contribute more in the future.
I'm primarily interested in improving…

MapReddy Usthili
- 288
- 1
- 7
- 23
0
votes
2 answers
Apache Hive how to identify which column is the partition
I have a set of log files, created a Hive table, now i want to partition the table based on a col what I don't understand & have not seen examples is how to specify the column for partition how to specify the col/field
Ex. here is line from the log…

Integration
- 337
- 1
- 4
- 15
-1
votes
1 answer
Hive Managed vs External tables maintainability
Which one is better (performance wise and operation on the long run) in maintaining data loaded, managed or external?
And by maintaining, i mean that these tables will have the following operations on daily basis frequently;
Select using partitions…

amr007
- 29
- 1
- 8