Highest Voted 'hadoop-partitioning' Questions

1

vote

1 answer

Hadoop streaming KeyFieldBasedPartitioner

I am extracting data from freebase dump (title, aliases, type names) into avro (not yet in this job). I am using mapreduce streaming with python. This job reducer expects type title (which is generally any object title) and type id reference to…

hadoop hadoop-streaming hadoop-partitioning

asked Dec 02 '14 at 15:05

Ondrej Galbavý

159
1
13

1

vote

1 answer

Hadoop Datanode configuration Cores and RAM

I am using Hadoop cluster with 9 nodes. I would like to know what is the basic datanode configuration in Hadoop cluster. I am using following configuration on Namenode and Datanode. RAM = 4GB Cores = 4 Disk = 8 ( Total 16GB storage…

hadoop mapreduce cpu ram hadoop-partitioning

asked Oct 07 '14 at 17:09

navaz

125
1
2
15

1

vote

0 answers

reducer always fails and map succeeds

I am running simple wordcount job on 1GB of text file . My cluster has 8 Datanodes and 1 namenode each has a storage capacity of 3GB. When i run wordcount I can see map always succeeds and reducer is throwing an error and fails. Please find below…

hadoop mapreduce hadoop-partitioning reducers

asked Oct 05 '14 at 21:53

navaz

125
1
2
15

1

vote

0 answers

Can we read built-in counters in Hadoop for individual tasks

Can we read built-in counters in Hadoop for individual tasks and in a periodic manner (say every 500 ms or 1 sec) and record in a file. If we can do that then how to do that? How to get the individual task pids?

hadoop mapreduce hadoop-streaming hadoop-plugins hadoop-partitioning

asked Sep 12 '14 at 03:08

Srinivas Naik Nenavath

11
1

1

vote

2 answers

Input split for Map function in Hadoop

This is my first implementation in Hadoop. I am trying to implement my algorithm for probabilistic dataset in Map Reduce. In my dataset, last column will have some id(number of unique id's in the dataset is equal to the number of nodes in my…

python hadoop hadoop-streaming hadoop2 hadoop-partitioning

asked Sep 08 '14 at 08:33

ds_user

2,139
4
36
71

1

vote

1 answer

How to solve the chainmapper is not applicable for the arguments error while doing job chaining in Mapreduce?

I'm using Hadoop 1.2.1, eclipse juno. I'm trying to chaining three map task in a single Mapreduce job. while writing Mapreduce code in eclipse, I'm getting error like chainmapper is not applicable for the arguments and also I cant set inputpath.…

eclipse hadoop hadoop-streaming hadoop2 hadoop-partitioning

asked Aug 11 '14 at 06:44

Karthick

97
1
1
7

1

vote

2 answers

Output of reducer sent to HDFS where as map output is stored in data node local disk?

I am bit confused about HDFS storage and Data node storage. Below are my doubts. Map function output will be saved to data node local disk and reducer output will be sent to HDFS. As we all know that data blocks are stored in data nodes local disk…

hadoop hadoop-streaming hadoop-partitioning hadoop2

asked Apr 22 '14 at 11:32

Suresh Babu D.V

11
1
2

1

vote

1 answer

Files through map function in map reduce

Is it possible to somehow pass a set of files through each map function. The requirement will be to process each file in parallel for different-2 operations. I am completely new to map reduce and i am using JAVA as my programming language.

hadoop mapreduce hadoop-partitioning

asked Feb 27 '14 at 17:47

anuj pradhan

2,777
4
26
31

1

vote

1 answer

MapReduce streaming job with -libjars, custom partitioner fails: "class not found"

I am trying to attach a custom (java) partitioner to my MapReduce streaming job. I am using this command: ../bin/hadoop jar ../contrib/streaming/hadoop-streaming-1.2.1.jar \ -libjars ./NumericPartitioner.jar -D mapred.map.tasks=12 -D…

java hadoop mapreduce streaming hadoop-partitioning

asked Nov 18 '13 at 12:15

SoItBegins

414
1
6
22

1

vote

1 answer

Splits in hadoop with variable-length/non-delimited binary file

I've just started working on a hadoop based ingester for open street map data. There are a few formats - but I've been targeting a protocolbuffer based format (note - it's not pure pb). It's looking to me like it would be more efficient to…

hadoop gis openstreetmap hadoop-partitioning

asked Nov 17 '13 at 17:10

Chris B

926
7
16

1

vote

2 answers

Handle uneven distribution of values across keys in Hadoop mapreduce

I am dealing with a input log files in hadoop where the keys are not evenly distributed. This means that the reducers have uneven distribution of values. For example key1 has 1 value and key2 has 1000 value. Is there any way to do the load…

java hadoop mapreduce partitioning hadoop-partitioning

asked Jul 25 '13 at 23:49

udag

41
1
6

1

vote

1 answer

Hadoop webuser: No such user

While running a hadoop multi-node cluster , i got below error message on my master logs , can some advise what to do..? do i need to create a new user or can i gave my existing Machine user name over here 2013-07-25 19:41:11,765 WARN …

hadoop mapreduce hadoop-streaming hadoop-plugins hadoop-partitioning

asked Jul 25 '13 at 15:45

Surya

3,408
5
27
35

1

vote

1 answer

How to partition large Hive table with many categories

I want to partition my table in hive so that for every unique item in the row it creates a partition. There are ~250 partitions for about a 4 billion row table so I would like to to something like a for loop or a distinct. Here is my thoughts in…

for-loop hive hadoop-partitioning

asked Jul 17 '13 at 19:02

user1807096

25
1
2
7

1

vote

2 answers

Custom Partitioner in Hadoop

I have some data that is keyed by ids in the range of 0 to 200-something million and I need to split it up into bucks for ranges like 0-5mil, 5mil - 10mil, etc. I'm attempting to use a custom partitioner on Hadoop for this final part so that the…

hadoop apache-pig hadoop-partitioning

asked Jul 09 '13 at 17:31

sbilstein

307
3
14

1

vote

2 answers

hadoop - how total mappers are determined

I am new to hadoop and just installed oracle's virtualbox and hortonworks' sandbox. I then, downloaded the latest version of hadoop and imported the jar files into my java program. I copied a sample wordcount program and created a new jar file. I…

hadoop hadoop-partitioning

asked Jun 19 '13 at 15:49

Ramesh

765
7
24
52

Questions tagged [hadoop-partitioning]