Questions tagged [hadoop-partitioning]

Hadoop partitioning deals with questions about how hadoop decides which key/value pairs are to be sent to which reducer (partition).

339 questions
0
votes
1 answer

Hadoop Map task/Map object

As per theory following properties are to define number of map/red task slots at data node. mapred.tasktracker.map.tasks.maximum | mapred.map.tasks. Also, number of mapper objects is decided by the number of input splits in the MapReduce job. We…
0
votes
1 answer

HDInsight Azure Blob Storage Data Update

I am considering HDInsight with Hive and data loaded on Azure Blob Storage. There is a combination of both historic and changing data. Does the solution mentioned in Update , SET option in Hive work with blob storage too? The below Hive statement…
0
votes
1 answer

hadoop mapreduce partitioner not invoked

I need help with mapreduce job, my custom partitioner is never invoked. I checked everything million times, but no result. It used to work a while ago, I have no idea why now it isn't. Any help would be very appreicated. I am adding the code (It…
0
votes
3 answers

Provide map splits with splits of the same file

How can I provide each line of a file fed to the mapper with splits of the same file? Basically what i want to do is for each line in file-split { for each line in file{ //process } } Can i do this using map reduce in…
Nitin J
  • 78
  • 1
  • 2
  • 9
0
votes
1 answer

using hive to select data within large range partitions

I've came out some problem using hive to select data within large range partitions Here's the HQL I want to execute: INSERT OVERWRITE TABLE summary_T partition(DateRange='20131222-20131228') select col1, col2, col3 From RAW_TABLE where cdate…
Dennis Shen
  • 61
  • 1
  • 6
0
votes
1 answer

What two different keys go to the same reducer by the default hash partitioner in Hadoop?

As we know that Hadoop guarantees that the same keys which come from different mappers will be sent to the same reducer. But if two different keys have the same hash value, they definitely will go to the same reducer, so will them be sent to the…
Judking
  • 6,111
  • 11
  • 55
  • 84
0
votes
2 answers

Partitioner or MultipleOutputs

I would like to have your opinion regarding Partitioner vs MultipleOutputs. Suppose I have a file which contains keys as 0:aaa 1:bbb 0:ccc 0:ddd ... 1:zzz I would like have 2 files: one file containing keys starting with 0: and the…
0
votes
3 answers

how to group by data from hive with specific partition?

I have the following: hive>show partitions TABLENAME pt=2012.07.28.08 pt=2012.07.28.09 …
user2935539
  • 73
  • 2
  • 6
0
votes
0 answers

Hadoop disk usage (intermediate reduce)

I' new in Hadoop, I'm using a cluster and I have a disk quote of 15GB. If I try to execute the wordcount sample on a big dataset (about 25GB) I receveid always the exception "The DiskSpace quota of xxxx is exceeded: ". I checked my disk usage after…
0
votes
1 answer

Sending data from all mappers to all reducers

Before this question is flagged duplicate, please read through. This has been asked many number of times with no clear answer. Lets say my task is to compute unigram probability for every word in millions of files. I can emit word counts from…
abhinavkulkarni
  • 2,284
  • 4
  • 36
  • 54
0
votes
2 answers

How does Hadoop decide to distribute among buckets/nodes?

I am new to Map/Reduce and Hadoop framework. I am running a Hadoop program on single machine (for trying it out). I have n input files and I want some summary of words from those files. I know map function returns key value pair, but how map is…
0
votes
0 answers

How to best decide mapper output/reducer input for a huge string

I need to improve my MR jobs which uses HBase as source as well as sink.. Basically, i'm reading data from 3 HBase Tables in the mapper, writing them out as one huge string for the reducer to do some computation and dump into a HBase Table..…
Pavan
  • 658
  • 2
  • 7
  • 28
0
votes
1 answer

How input of small size is read by a mapper in map-reduce?

I have a map-reduce job whose input is a big data set (let's say of size 100GB). What this map-reduce job does is splitting the big data into chunks and writing separate files, one per each data chunk. That is, the output of the job is multiple…
HHH
  • 6,085
  • 20
  • 92
  • 164
0
votes
1 answer

How the input file gets split into chunks by the map-reduce framework?

I have an iterative mapreduce job in which, when a chunk, let's say Chunk i, is read by a mapper some information regarding the records within this chunk is stored in an auxiliary file, called F_i. In the next iteration (job), a different mapper…
HHH
  • 6,085
  • 20
  • 92
  • 164
0
votes
2 answers

can reduce task accept compressed data in hadoop

we see that map can accept and output compressed and uncompressed data. I was going through cloudera training and teacher mentioned that reduce task input has to be in form o key value and thus can't work on compressed data. Is that right? If thats…
bruceparker
  • 1,235
  • 1
  • 17
  • 33