Questions tagged [hadoop-partitioning]

Hadoop partitioning deals with questions about how hadoop decides which key/value pairs are to be sent to which reducer (partition).

339 questions
0
votes
1 answer

Gathering multiple mapper's result sorted at Reducer in Hadoop

I have multiple very large files(nearly 500MB) as input to my MR program. I divide(split) these files into equal size partitions. Each Mapper gets single partition of a file Mapper : Key=(filename, partition_number) and Value= (character stream of…
0
votes
1 answer

Can a slave node have multiple blocks of the same file in hadoop?

Say I have a hadoop cluster where one node is the Master node and the other is a Data node. The slave node is an 8-core machine just to make sure there are enough cores to process jobs parallelly. Can i still split the file into say 3 blocks and…
Sheel Pancholi
  • 621
  • 11
  • 25
0
votes
1 answer

TotalOrderPartion with ChainMapper

I have a ChainMapper with 2 mappers associated to it. I am trying to perform a TotalOrderPartition on the last mapper in the chain with out much of a success. Is there a way to enforce partitioning based on some sampling on the Nth mapper in the…
bitan
  • 444
  • 4
  • 14
0
votes
1 answer

How to sort a column in data set in descending order using Java Hadoop map reduce?

My data file is: Utsav Chatterjee Dangerous Soccer Coldplay 4 Rodney Purtle Awesome Football Maroon5 3 Michael Gross Amazing Basketball Iron Maiden 6 Emmanuel Ezeigwe Cool Pool Metallica 5 John Doe Boring Golf …
0
votes
1 answer

creating custom key value for mappers in hadoop from file

I have a file of size 50MB(complete text data without spaces). I want to partition this data in such a way that each mapper should get 5MB data. Mapper should get data in (K,V) format where key - partition Number(like 1,2,..) and Value is the plain…
Sumit
  • 27
  • 8
0
votes
0 answers

Hadoop Streaming: How to parition output into subfolders?

To be specific, for example, given hadoop jar hadoop-streaming.jar \ -input myInputDirs \ -output myOutputDir \ -mapper /bin/cat \ -reducer /usr/bin/wc Where myInputDirs has a dated subfolder structure of input_dir/yyyy/mm/dd/part-* I…
Osiris
  • 1,007
  • 4
  • 17
  • 30
0
votes
1 answer

How to make an UNION in HIVE over two EXTERNAL TABLES which point to the same file

I'm trying to write a Hive script which creates two External tables, both of them pointing to the same file LOCATION with differents regular expressions (filters). When I try to make an UNION between them, results aren't as expected. The first…
marcos
  • 21
  • 3
0
votes
1 answer

Why is `getNumPartitions()` not giving me the correct number of partitions specified by `repartition`?

I have a textFile in and RDD like so: sc.textFile(). I try to repartition the RDD in order to speed up processing: sc.repartition(). No matter what I put in for , it does not seem to change, as indicated by: RDD.getNumPartitions()…
makansij
  • 9,303
  • 37
  • 105
  • 183
0
votes
1 answer

HashPartition in MapReduce

Objective : Implement HashPartition and check the no of reducers that are getting created automatically. Any help and any sample code is always appreciated for this purpose. What I did : I ran a map reduce program with Hash Partition implemented…
Ritab
  • 37
  • 6
0
votes
1 answer

How to deal with .gz input files with Hadoop?

Please allow me to provide a scenario: hadoop jar test.jar Test inputFileFolder outputFileFolder where test.jar sorts info by key, time, and place inputFileFolder contains multiple .gz files, each .gz file is about 10GB outputFileFolder…
frankilee
  • 77
  • 1
  • 7
0
votes
3 answers

Insert partitioned data into partitioned hive table

I have stored the data in hdfs using Pig Multistorage with the column id. So data stored as /output/1/part-0000 /output/2/ /output/3/ Now I have created a partitioned table in hive and I want to load the data from /output folder into this…
wazza
  • 770
  • 5
  • 17
  • 42
0
votes
1 answer

HIVE: Empty buckets getting created after partitioning in HDFS

I was trying to create Partition and buckets using HIVE. For setting some of the properties: set hive.enforce.bucketing = true; SET hive.exec.dynamic.partition = true; SET hive.exec.dynamic.partition.mode = nonstrict; Below is the code for creating…
user182944
  • 7,897
  • 33
  • 108
  • 174
0
votes
0 answers

Hadoop KeyComposite and Combiner

I am doing a secondary sort in Hadoop 2.6.0, I am following this tutorial: https://vangjee.wordpress.com/2012/03/20/secondary-sorting-aka-sorting-values-in-hadoops-mapreduce-programming-paradigm/ I have the exact same code, but now I am trying to…
0
votes
3 answers

Split input to a reducer in hadoop

This question is kind of related to my other question Hadoop handling data skew in reducer. However, I would like to ask if there are some configuration settings available so that if say the max reducer memory is reached then spawn off a new reducer…
sunny
  • 824
  • 1
  • 14
  • 36
0
votes
2 answers

Hadoop handling data skew in reducer

Am trying to determine if there are certain hooks available in the hadoop api (hadoop 2.0.0 mrv1) to handle data skew for a reducer. Scenario : Have a custom Composite key and partitioner in place to route data to reducers. In order to deal with the…
sunny
  • 824
  • 1
  • 14
  • 36