Questions tagged [combiners]

105 questions
3
votes
5 answers

Who will get a chance to execute first , Combiner or Partitioner?

I'm getting confused after reading below article on Hadoop- Definitive guide 4th edition(page-204) Before it writes to disk, the thread first divides the data into partitions corresponding to the reducers that they will ultimately be sent to.…
3
votes
2 answers

Partial aggregation vs Combiners which one faster?

There are notice about what how cascading/scalding optimized map-side evaluation They use so called Partial Aggregation. Is it actually better approach then Combiners? Are there any performance comparison on some common hadoop tasks(word count for…
yura
  • 14,489
  • 21
  • 77
  • 126
2
votes
1 answer

Combiner hack for hadoop streaming

The current version of hadoop-streaming requires a Java class for the combiner, but i read somewhere that we can use a hack like the following: hadoop jar ./contrib/streaming/hadoop-0.20.2-streaming.jar -input /testinput -output /testoutput -mapper…
greenberet123
  • 1,351
  • 1
  • 12
  • 22
2
votes
1 answer

Can I use Combiner to compute average in a mapreduce job?

I want to implement a mapreduce job that reads parquet files with the following schema: { optional int96 dropoff_datetime; optional float dropoff_latitude; optional float dropoff_longitude; optional int32 dropoff_taxizone_id; optional…
rafik_bougacha
  • 81
  • 2
  • 10
2
votes
2 answers

Does HBase MapReduce support a combiner stage? And if so, how?

Hadoop map reduce supports a combiner stage. However, I can't find a similar capability in the HBase MapReduce package. Does it exist?
user44242
  • 1,168
  • 7
  • 20
2
votes
3 answers

What's the difference between shuffle phase and combiner phase?

i'm pretty confused about the MapReduce Framework. I'm getting confused reading from different sources about that. By the way, this is my idea of a MapReduce Job 1. Map()-->emit 2. Partitioner (OPTIONAL) --> divide intermediate…
rollotommasi
  • 461
  • 1
  • 6
  • 11
2
votes
2 answers

Spark Scala: GroupByKey and sort

I have a RDD with the following structure: val rdd = RDD[ (category: String, product: String, score: Double) ] My objective is to group the data based on category, and then for each category sort w.r.t. score of Tuple 2 (product, score). As for now…
Mohitt
  • 2,957
  • 3
  • 29
  • 52
2
votes
2 answers

How to join 2 array into a single json/array in node

I have 2 arrayes in node . ['3', '7' ] [ 'Circulatory and Cardiovascular', 'Respiratory' ] I want to produce result as below. {{"id": "3","name":"Circulatory and Cardiovascular"},{"id": "7","name":"Respiratory"}}
Shanthi
  • 686
  • 3
  • 11
  • 22
2
votes
1 answer

Hadoop combiner execution on reducers

I have a long running MapReduce job with some mappers taking considerably more time than others. Checking the stats on the web interface, I saw that my combiner also kicked in on the reducers (which where mostly idle as just 2 mappers were still…
dominik
  • 613
  • 2
  • 6
  • 10
2
votes
1 answer

Combine value of multidimensional Array

I have multidimensional Array output like this As below in which i want to Combine the values of pid and map using any separator but not comma(,) where id is same This is a sample data Array has more than 20000 values and depth level unknown may be…
Mark Pole
  • 27
  • 4
2
votes
4 answers

How can I combine rows within the same data frame in R (based on duplicate values under a specific column)?

Sample of 2 (made-up) example rows in df: userid facultyid courseid schoolid 167 265 NA 1678 167 71111 301 NA Suppose that I have a couple hundred duplicate userid like in the above example. However, the vast…
poeticpersimmon
  • 179
  • 1
  • 3
  • 8
2
votes
2 answers

Combiners , Reducers and EcoSystemProject in Hadoop

What do you think of the answer for Question 4 mentioned in this site will be ? Is the answer right or wrong QUESTION: 4 In the standard word count MapReduce algorithm, why might using a combiner reduce theoverall Job running time? A. Because…
USB
  • 6,019
  • 15
  • 62
  • 93
2
votes
1 answer

How to disable hadoop combiner?

In wordcount example, the combiner is explicitly set in job.setCombinerClass(IntSumReducer.class); I would like to disable the combiner so that the output of mapper is not processed by the combiner. Is there a way to do that using MR config files…
polerto
  • 1,750
  • 5
  • 29
  • 50
2
votes
2 answers

Mapreduce job: combiner without reducer

I noticed that if I set number of reducer to 0, the combiner won't work. Is it possible to use combiner without reducer? Thanks.
avhacker
  • 667
  • 1
  • 9
  • 20
2
votes
2 answers

Two equal combine keys do not get to the same reducer

I'm making a Hadoop application in Java with the MapReduce framework. I use only Text keys and values for both input and output. I use a combiner to do an extra step of computations before reducing to the final output. But I have the problem that…
Bjarkes
  • 99
  • 1
  • 2
  • 8