Highest Voted 'hadoop-partitioning' Questions

0

votes

1 answer

Gathering multiple mapper's result sorted at Reducer in Hadoop

I have multiple very large files(nearly 500MB) as input to my MR program. I divide(split) these files into equal size partitions. Each Mapper gets single partition of a file Mapper : Key=(filename, partition_number) and Value= (character stream of…

asked Apr 01 '16 at 05:47

Sumit

27
8

0

votes

1 answer

Can a slave node have multiple blocks of the same file in hadoop?

Say I have a hadoop cluster where one node is the Master node and the other is a Data node. The slave node is an 8-core machine just to make sure there are enough cores to process jobs parallelly. Can i still split the file into say 3 blocks and…

hadoop mapreduce hadoop2 hadoop-partitioning

asked Mar 31 '16 at 15:43

Sheel Pancholi

621
11
25

0

votes

1 answer

TotalOrderPartion with ChainMapper

I have a ChainMapper with 2 mappers associated to it. I am trying to perform a TotalOrderPartition on the last mapper in the chain with out much of a success. Is there a way to enforce partitioning based on some sampling on the Nth mapper in the…

hadoop mapreduce hadoop2 hadoop-partitioning bigdata

asked Mar 11 '16 at 07:14

bitan

444
4
14

0

votes

1 answer

How to sort a column in data set in descending order using Java Hadoop map reduce?

My data file is: Utsav Chatterjee Dangerous Soccer Coldplay 4 Rodney Purtle Awesome Football Maroon5 3 Michael Gross Amazing Basketball Iron Maiden 6 Emmanuel Ezeigwe Cool Pool Metallica 5 John Doe Boring Golf …

java sorting hadoop mapreduce hadoop-partitioning

asked Mar 05 '16 at 20:48

Utsav Chatterjee

181
3
14

0

votes

1 answer

creating custom key value for mappers in hadoop from file

I have a file of size 50MB(complete text data without spaces). I want to partition this data in such a way that each mapper should get 5MB data. Mapper should get data in (K,V) format where key - partition Number(like 1,2,..) and Value is the plain…

java hadoop mapreduce hadoop-partitioning bigdata

asked Feb 18 '16 at 06:31

Sumit

27
8

0

votes

0 answers

Hadoop Streaming: How to parition output into subfolders?

To be specific, for example, given hadoop jar hadoop-streaming.jar \ -input myInputDirs \ -output myOutputDir \ -mapper /bin/cat \ -reducer /usr/bin/wc Where myInputDirs has a dated subfolder structure of input_dir/yyyy/mm/dd/part-* I…

hadoop mapreduce hadoop-streaming hadoop2 hadoop-partitioning

asked Jan 20 '16 at 23:37

Osiris

1,007
4
17
30

0

votes

1 answer

How to make an UNION in HIVE over two EXTERNAL TABLES which point to the same file

I'm trying to write a Hive script which creates two External tables, both of them pointing to the same file LOCATION with differents regular expressions (filters). When I try to make an UNION between them, results aren't as expected. The first…

hadoop hive hiveql hadoop-partitioning

asked Dec 26 '15 at 15:56

marcos

21
3

0

votes

1 answer

Why is `getNumPartitions()` not giving me the correct number of partitions specified by `repartition`?

I have a textFile in and RDD like so: sc.textFile(). I try to repartition the RDD in order to speed up processing: sc.repartition(). No matter what I put in for , it does not seem to change, as indicated by: RDD.getNumPartitions()…

apache-spark pyspark partition hadoop-partitioning

asked Dec 16 '15 at 00:10

makansij

9,303
37
105
183

0

votes

1 answer

HashPartition in MapReduce

Objective : Implement HashPartition and check the no of reducers that are getting created automatically. Any help and any sample code is always appreciated for this purpose. What I did : I ran a map reduce program with Hash Partition implemented…

hadoop mapreduce hadoop-partitioning

asked Nov 09 '15 at 15:10

Ritab

37
6

0

votes

1 answer

How to deal with .gz input files with Hadoop?

Please allow me to provide a scenario: hadoop jar test.jar Test inputFileFolder outputFileFolder where test.jar sorts info by key, time, and place inputFileFolder contains multiple .gz files, each .gz file is about 10GB outputFileFolder…

hadoop zip gzip hadoop2 hadoop-partitioning

asked Nov 05 '15 at 15:27

frankilee

77
1
7

0

votes

3 answers

Insert partitioned data into partitioned hive table

I have stored the data in hdfs using Pig Multistorage with the column id. So data stored as /output/1/part-0000 /output/2/ /output/3/ Now I have created a partitioned table in hive and I want to load the data from /output folder into this…

hadoop hive apache-pig hadoop-partitioning

asked Oct 29 '15 at 11:20

wazza

770
5
17
42

0

votes

1 answer

HIVE: Empty buckets getting created after partitioning in HDFS

I was trying to create Partition and buckets using HIVE. For setting some of the properties: set hive.enforce.bucketing = true; SET hive.exec.dynamic.partition = true; SET hive.exec.dynamic.partition.mode = nonstrict; Below is the code for creating…

hadoop hive bigdata hadoop-partitioning

asked Oct 15 '15 at 03:09

user182944

7,897
33
108
174

0

votes

0 answers

Hadoop KeyComposite and Combiner

I am doing a secondary sort in Hadoop 2.6.0, I am following this tutorial: https://vangjee.wordpress.com/2012/03/20/secondary-sorting-aka-sorting-values-in-hadoops-mapreduce-programming-paradigm/ I have the exact same code, but now I am trying to…

hadoop hadoop-streaming hadoop2 hadoop-partitioning hadoop-plugins

asked Oct 04 '15 at 08:35

ie8888

171
1
10

0

votes

3 answers

Split input to a reducer in hadoop

This question is kind of related to my other question Hadoop handling data skew in reducer. However, I would like to ask if there are some configuration settings available so that if say the max reducer memory is reached then spawn off a new reducer…

hadoop mapreduce hadoop-partitioning reducers

asked Sep 17 '15 at 18:40

sunny

824
1
14
36

0

votes

2 answers

Hadoop handling data skew in reducer

Am trying to determine if there are certain hooks available in the hadoop api (hadoop 2.0.0 mrv1) to handle data skew for a reducer. Scenario : Have a custom Composite key and partitioner in place to route data to reducers. In order to deal with the…

hadoop hadoop-partitioning reducers

asked Sep 17 '15 at 10:13

sunny

824
1
14
36

Questions tagged [hadoop-partitioning]