Questions tagged [mapper]

The mapper is the first step in MapReduce framework, a component of a larger scalable, parallel-izable algorithm.

Maps input key/value pairs to a set of intermediate key/value pairs.

Maps are the individual tasks which transform input records into a intermediate records. The transformed intermediate records need not be of the same type as the input records. A given input pair may map to zero or many output pairs.

The most common map reduce framework is Apache Hadoop.

See also MapReduce Wiki.

653 questions
0
votes
1 answer

Output blank from python hadoop mapper

Input text is as such, repeated a kabillion times: value1 | foo="bar" value2 | value3 I wrote a basic mapper in python for a basic streaming job: #!/usr/bin/env python import sys for line in sys.stdin: line = line.replace('foo=','') line =…
Todd Curry
  • 1,045
  • 1
  • 10
  • 23
0
votes
1 answer

Running Mappers and Reducers on different Groups of machines

We have a nice, big, complicated elastic-mapreduce job that has wildly different constraints on hardware for the Mapper vs Collector vs Reducer. The issue is: for the Mappers, we need tonnes of lightweight machines to run several mappers in…
0
votes
1 answer

Get the total mapping and reducing times in hadoop programmatically

I am trying to calculate the individual total times of Mapping, Shuffling and Reducing by all tasks in my MR code. I need help retrieving that information for each MapReduce Job. Can someone post any code snippet that does that calculation?
0
votes
1 answer

Using CountVectorizer in Python Mapper Reducer

I am trying to apply tokenizer using python mapper reducer function. I have following code but I keep getting error. reducer outputs values in a list and I am passing values to the vectorizer. from mrjob.job import MRJob from…
Sohail
  • 1,137
  • 2
  • 12
  • 22
0
votes
1 answer

Ant: How to write a mapper for parent folder

Environment: Windows 2008 R2 JDK: 1.7.0_45 (x64) Ant: 1.8.3 I'm trying to extract a few cab files. For the purpose of discussion, assume the following layout: A/L/X.cab B/M/Y.cab C/N/Z.cab What I tried:
Parag Doke
  • 863
  • 7
  • 17
0
votes
1 answer

Hadoop's passage of parameter

I've known that a writable object can be passed to mapper using something like: DefaultStringifier.store(conf, object ,"key"); object = DefaultStringifier.load(conf, "key", Class ); My question is: In a mapper I read out the object then change the…
0
votes
2 answers

using IQueryable deferred execution

I am working on a simple mapping EntityFramework <> DTO's , it's working perfecly excepto for the deferred execution , I have the following code : public abstract class Assembler : IAssembler where TEntity :…
Marc
  • 2,023
  • 4
  • 16
  • 30
0
votes
1 answer

Accessing file in Pig through Distributed Cache

I went through many pages on Stackoverflow regarding this. But still I am confused. Even if this is a duplicate question or a similar one, Please answer I want to compare one file against another in Pig and I want one of the files to be in…
Pooja3101
  • 701
  • 3
  • 8
  • 13
0
votes
1 answer

Combining two different files in Hadoop

I have a very specific problem in Hadoop. I have two files userlist and *raw_data*. Now raw_data is a pretty big file and userlist is a comparatively smaller than the other file. I have to first identify the number of mappers and my userlist has to…
0
votes
2 answers

Set the number of map tasks

While configuring a Map Reduce job, I know that one can set the number of reduce tasks by using the method job.setNumReduceTasks(2);. Can we set the number of map tasks? I don't see any methods to do this. If there is no such functionality, does…
Surender Raja
  • 3,553
  • 8
  • 44
  • 80
0
votes
1 answer

MapReduce Programming model - can Mapper communicate with each other during map process

I know that reduce task must run independently and in isolation. But for Mapper, it looks like there's a chance for mappers to communicate with each other ? If so, please explain.
Clark
  • 21
  • 4
0
votes
0 answers

object mapper conversion issue when string contains'\/' patterns

I have following type of token that as to be send as it is in request. //token contains following value token = "ABCD/saljljlkljljljl"; class Token{ public String token; public String getToken() { return token; } public…
developer
  • 401
  • 1
  • 4
  • 15
0
votes
1 answer

Hadoop reading as a whole file and send to many mappers

I am writing a hadoop app where I want to read the input file as a whole and send it to manny mappers and let each mappers do part of the job. Here is my FileInputFormat. I have to make isSplitablereturn false so that I can read the whole file.…
flexwang
  • 625
  • 6
  • 16
0
votes
1 answer

How to create an AsSomeClass in Linq?

I have this 'method' : private static readonly Expression> AsSomeClass = x => new SomeClass { }; which lets me do : _ctx.EntityClasses.Where(e => e.SomeProperty ==…
Bart Calixto
  • 19,210
  • 11
  • 78
  • 114
0
votes
1 answer

Map QWidget to variable

The idea was to connect QWidget with a variable so that when text changes on a widget it will be also changed in a variable. And do this with just one line like this WidgetMapper::connect(ui->lineEdit, SIGNAL(textChanged(QString)),…
Stals
  • 1,543
  • 4
  • 27
  • 52