Questions tagged [mapreduce]

MapReduce is an algorithm for processing huge datasets on certain kinds of distributable problems using a large number of nodes

MapReduce is a programming model and an associated implementation for processing and generating large data sets. Users specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs, and a reduce function that merges all intermediate values associated with the same intermediate key.

The advantage of MapReduce is that it allows for distributed processing of the map and reduction operations. Provided each mapping operation is independent of the other, all maps can be performed in parallel - though in practice it is limited by the data source and/or the number of CPUs near that data. Similarly, a set of 'reducers' can perform the reduction phase - all that is required is that all outputs of the map operation which share the same key are presented to the same reducer, at the same time. While this process can often appear inefficient compared to algorithms that are more sequential, MapReduce can be applied to significantly larger datasets than "commodity" servers can handle - a large server farm can use MapReduce to sort a petabyte of data in only a few hours. The parallelism also offers some possibility of recovering from partial failure of servers or storage during the operation: if one mapper or reducer fails, the work can be rescheduled — assuming the input data is still available.

"Map" step: The master node takes the input, chops it up into smaller sub-problems, and distributes those to worker nodes. A worker node may do this again in turn, leading to a multi-level tree structure. The worker node processes that smaller problem, and passes the answer back to its master node.

"Reduce" step: The master node then takes the answers to all the sub-problems and combines them in some way to get the output - the answer to the problem it was originally trying to solve.

12151 questions

votes

2 answers

Difference in calling the job

what is the difference between calling a mapreduce job from main() and from ToolRunner.run()? When we say that the main class say, MapReduce extends Configured implements Tool , what are the additional privileges we get which we do not have if we…

java hadoop mapreduce

asked Mar 25 '12 at 11:01

Ravi Trivedi

votes

4 answers

Getting Started with Avro

I want to get started with using Avro with Map Reduce. Can Someone suggest a good tutorial / example to get started with. I couldnt find much through the internet search.

mapreduce avro

asked Mar 29 '11 at 23:48

Sri

votes

3 answers

Hadoop on windows server

I'm thinking about using hadoop to process large text files on my existing windows 2003 servers (about 10 quad core machines with 16gb of RAM) The questions are: Is there any good tutorial on how to configure an hadoop cluster on windows? What are…

c# windows hadoop mapreduce cluster-computing

asked Jan 22 '09 at 02:58

Luca Martinetti

3,396
6
34
49

votes

7 answers

/bin/bash: /bin/java: No such file or directory error in Yarn apps in MacOS

I was trying to run a simple wordcount MapReduce Program using Java 1.7 SDK and Hadoop2.7.1 on Mac OS X EL Captain 10.11 and I am getting the following error message in my container log "stderr" /bin/bash: /bin/java: No such file or…

java macos hadoop mapreduce hadoop-yarn

asked Nov 28 '15 at 06:27

Gangadhar Kadam

votes

4 answers

creating partition in external table in hive

I have successfully created and added Dynamic partitions in an Internal table in hive. i.e. by using following steps: 1-created a source table 2-loaded data from local into source table 3- created another table with partitions - partition_table 4-…

hadoop hive mapreduce hbase

asked Sep 15 '15 at 07:39

Anoop Mamgain

votes

2 answers

Why is Spark faster than Hadoop Map Reduce

Can someone explain using the word count example, why Spark would be faster than Map Reduce?

mapreduce apache-spark

asked Sep 14 '15 at 19:34

Victor

16,609
71
229
409

votes

2 answers

Hadoop Mapper is failing because of "Container killed by the ApplicationMaster"

I am trying to execute a map reduce program on Hadoop. When i submit my job to the hadoop single node cluster. The job is getting created but failing with the message "Container killed by the ApplicationMaster" The input used is of the size 10…

java linux hadoop mapreduce

asked May 29 '15 at 15:30

Harry

votes

3 answers

How to use Cassandra's Map Reduce with or w/o Pig?

Can someone explain how MapReduce works with Cassandra .6? I've read through the word count example, but I don't quite follow what's happening on the Cassandra end vs. the "client"…

mapreduce cassandra apache-pig

asked Apr 29 '10 at 00:17

Brent

23,354
10
44
49

votes

3 answers

MongoDB Aggregation Framework performance slow over millions of documents

background Our system is carrier grade and extremely robust, it has been load tested to handle 5000 transactions per second, and for each transaction a document is inserted into a single MongoDB collection (no updates or queries in this application,…

mongodb indexing mapreduce aggregation-framework

asked Nov 14 '13 at 15:57

Ashley Brener

votes

4 answers

Hive ParseException - cannot recognize input near 'end' 'string'

I am getting the following error when trying to create a Hive table from an existing DynamoDB table: NoViableAltException(88@[]) at org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.identifier(HiveParser_IdentifiersParser.java:9123) at…

hadoop mapreduce hive bigdata amazon-dynamodb

asked Sep 05 '13 at 15:47

Jens Roland

27,450
14
82
104

votes

5 answers

How to build OpenCV with Java under Linux using command line?(Gonna use it in MapReduce)

Recently I'm trying OpenCV out for my graduation project. I've had some success under Windows enviroment. And because with Windows package of OpenCV it comes with pre-built libraries, so I don't have to worry about how to build them. But since the…

java linux opencv build mapreduce

asked Jun 30 '13 at 02:26

user2535650

votes

7 answers

How to specify KeyValueTextInputFormat Separator in Hadoop-.20 api?

In new API (apache.hadoop.mapreduce.KeyValueTextInputFormat) , how to specify separator (delimiter) other than tab(which is default) to separate key and Value. Sample Input : one,first line two,second line Ouput Required : Key : one Value : first…

java hadoop mapreduce

asked Feb 09 '12 at 12:51

Pradeep Bhadani

4,435
6
29
48

votes

2 answers

State of Map-Reduce on Appengine?

There is appengine-mapreduce which seems the official way to do things on AppEngine. But there seems no documentation besides some hacked together Wiki Pages and lengthy videos. There are statements that the lib only supports the map step. But the…

google-app-engine mapreduce

asked Dec 07 '11 at 07:30

max

29,122
12
52
79

votes

2 answers

How to customize Writable class in Hadoop?

I'm trying to implement Writable class, but i have no idea on how to implement a writable class if in my class there is nested object, such as list, etc. Could any body help me? thanks public class StorageClass implements Writable{ public String…

java hadoop mapreduce

asked Nov 03 '11 at 11:56

afancy

votes

1 answer

How to translate from SQL to NoSQL/MapReduce?

I have a background working with relational databases but recently started to dabble in CouchDB and was surprised by how some non-relational operations, which would be simple in SQL, were not first-class functions in CouchDB. I would appreciate you…

sql database nosql couchdb mapreduce

asked Jun 25 '11 at 18:11

sferik

1,795
2
15
22

Prev 1 2 3

…

99 100 Next