Questions tagged [hadoop-partitioning]

Hadoop partitioning deals with questions about how hadoop decides which key/value pairs are to be sent to which reducer (partition).

339 questions
0
votes
1 answer

hadoop partitioner getting incorrect reduce count

I'm working on partitioner today. Its the basic program in hadoop custom partitioners. Below is my partitioner code snippet. public class VowelConsPartitioner extends Partitioner { @Override public int getPartition(Text letterType, IntWritable…
0
votes
2 answers

Get max salary employee name using hadoop map reduce

i am very new to M/R programs..i have a file in HDFS with data in this structure EmpId,EmpName,Dept,Salary, 1231,userName1,Dept1,5000 1232,userName2,Dept2,6000 1233,userName3,Dept3,7000 . . …
user1585111
  • 1,019
  • 6
  • 19
  • 35
0
votes
1 answer

DiskErrorException on slave machine - Hadoop multinode

I am trying to process XML files from hadoop, i got following error on invoking word-count job on XML files . 13/07/25 12:39:57 INFO mapred.JobClient: Task Id : attempt_201307251234_0001_m_000008_0, Status : FAILED Too many fetch-failures 13/07/25…
0
votes
1 answer

Error on starting HDFS daemons on hadoop Multinode cluster

Issue While Hadoop multi-node set-up .As soon as i start My hdfs demon on Master (bin/start-dfs.sh) i did got below logs on Master starting namenode, logging to…
Surya
  • 3,408
  • 5
  • 27
  • 35
0
votes
1 answer

Hadoop command line explanation

Can some one explain me this syntax , bin/hadoop jar hadoop*examples*.jar wordcount /user/hpuser/testHadoop /user/hpuser/testHadoop-output Why are we using jar soon after bin/hadoop What does hadoop*examples*.jar means..? Do wordcount is name of…
Surya
  • 3,408
  • 5
  • 27
  • 35
0
votes
2 answers

Creating more partitions than reducers

When developing locally on my single machine, I believe the default number of reducers is 6. In a particular MR step, I actually divide up the data into n partitions where n can be greater than 6. From what I have observed, it looks like only 6 of…
syker
  • 10,912
  • 16
  • 56
  • 68
0
votes
1 answer

Generating multiple equally sized output files in Hadoop

What are some methods for finding X data ranges in Hadoop so that one can use these ranges as partitions in the reducer step?
syker
  • 10,912
  • 16
  • 56
  • 68
0
votes
1 answer

Hadoop file system is physical file system or virtual file system

Hadoop file system is physical file system or virtual file system
0
votes
2 answers

hadoop distribute partitions to reducer

For load balancing reasons, I want to create more partitions than reducers in a Hadoop environment. Is there a way to assign partitions to a specific reducers and if so, where can I define them. I wrote a individual Partitioner and want now to…
beto8888
  • 45
  • 1
  • 4
0
votes
3 answers

how to work on specific part of cvs file uploaded into HDFS?

how to work on specific part of cvs file uploaded into HDFS ? I'm new in Hadoop and i have an a question that is if i export an a relational database into cvs file then uploaded it into HDFS . so how to work on specific part (table) in file using…
Samy Louize Hanna
  • 821
  • 2
  • 8
  • 15
0
votes
2 answers

How to use hadoop MapReuce framework for an Opencl application?

I am developing an application in opencl whose basic objective is to implement a data mining algorithm on GPU platform. I want to use Hadoop Distributed File System and want to execute the application on multiple nodes. I am using MapReduce…
sandeep.ganage
  • 1,409
  • 2
  • 21
  • 47
0
votes
1 answer

How to increase hadoop map tasks by implementing getSplits

I want to process multiline CSV files and for that I wrote a custom CSVInputFormat. I would like to have about 40 threads processing CSV lines on each hadoop node. However, when I create a cluster on Amazon EMR with 5 machines (1 master and 4…
mvallebr
  • 2,388
  • 21
  • 36
0
votes
1 answer

how to Load key-value data into hbase tables?

Thanks for taking interest in my question. Before I begin, I'd like to let you know that I'm very new to Hadoop & HBase. So far, I find Hadoop very interesting and would like to contribute more in the future. I'm primarily interested in improving…
0
votes
2 answers

Apache Hive how to identify which column is the partition

I have a set of log files, created a Hive table, now i want to partition the table based on a col what I don't understand & have not seen examples is how to specify the column for partition how to specify the col/field Ex. here is line from the log…
Integration
  • 337
  • 1
  • 4
  • 15
-1
votes
1 answer

Hive Managed vs External tables maintainability

Which one is better (performance wise and operation on the long run) in maintaining data loaded, managed or external? And by maintaining, i mean that these tables will have the following operations on daily basis frequently; Select using partitions…
amr007
  • 29
  • 1
  • 8
1 2 3
22
23