Questions tagged [mapper]

The mapper is the first step in MapReduce framework, a component of a larger scalable, parallel-izable algorithm.

Maps input key/value pairs to a set of intermediate key/value pairs.

Maps are the individual tasks which transform input records into a intermediate records. The transformed intermediate records need not be of the same type as the input records. A given input pair may map to zero or many output pairs.

The most common map reduce framework is Apache Hadoop.

See also MapReduce Wiki.

653 questions
0
votes
1 answer

Time spent by a Hadoop MapReduce mapper task to read input files from HDFS or S3

I am running a Hadoop MapReduce job, getting input files from HDFS or Amazon S3. I am wondering if it's possible to know how long does it take for a mapper task to read file from HDFS or S3 to the mapper. I'd like to know the time just for reading…
0
votes
1 answer

mrJob python mapReduce word_count.py

I have just started using mrJob (mapReduce for python) and am new to the MapReduce paradigm, I would like to know the following about the word_count.py tutorial that is present on the MRJob documentation site. The docs say that if we create a…
anonuser0428
  • 11,789
  • 22
  • 63
  • 86
0
votes
1 answer

Dozer mapping of Property of a Object in the list to another list with hints

I have the following scenario of Mapping Class Contact { List marketSectorList; } Class SimpleCode { protected String code; protected String label; } Class ContactTarget { List marketSectors; } The following map is…
0
votes
1 answer

Split Class not found during Hadoop Execution

I am having a strange issue. I have my own implementation of filesystem instead of default distributedfilesystem. I have added my filesystem in fs.default.name and impl. When hadoop execution is started(teragen program),I could see that writes and…
GoT
  • 530
  • 1
  • 13
  • 35
0
votes
1 answer

Mybatis Mapper XML

I have a shipping information object that matches the table it is stored in in MySQL with the exception that the "Address" is an object (in the object) however it is an AddressId in the DB Table. My current result map looks like this:
Ginto Hewoo
  • 73
  • 1
  • 1
  • 8
0
votes
1 answer

Default Mapper-Reducer class

Suppose I have two dataset : hello world bye world and hello earth new earth and I want to run a map-reduce task which does not specify mapper class or reducer class, So the default mapper and reducer will be called - which both are identity…
Ronin
  • 2,027
  • 8
  • 32
  • 39
0
votes
1 answer

How to fetch Objects from Model/Service layer

In an app we are developing, we have Services, Mappers and Entities. We are not using an ORM. In the app, we have Group, GroupMember & Member entities. The GroupMember entity has the groupId, memberId & memberAccess properties. The memberAccess…
Bryan
  • 645
  • 1
  • 6
  • 18
0
votes
2 answers

Map Job Performance on cluster

Suppose I have 15 blocks of data and two clusters. The first cluster has 5 nodes and a replication factor is 1, while the second one has a replication factor is 3. If I run my map job, should I expect any change in the performance or the execution…
nj2012
  • 105
  • 2
  • 14
0
votes
0 answers

Complex Mapping form Source having nested classes to destination Having nested Classes

In my application i have to use mapping. Structure of my Source object is public class SourceContract { public string ContractNo {get; set;} public string ContractDescription {get; set;} List SourceSalaries…
user2739679
  • 827
  • 4
  • 14
  • 24
0
votes
1 answer

What is the namespace for Mapper.Map in asp.net mvc?

I am working on ASP.NET MVC4 and i want to map my view model and database table object. And for that i want to use Mapper.Map.But i don't have any idea about its Namespace. Can anyone suggest me the namespace for that ?
Pawan
  • 2,150
  • 11
  • 45
  • 73
0
votes
1 answer

How can I write an iteration in Python using mrjob mapper reducer, for which the counter is a part of the computation in the loop?

I have a program that iterates a mapper and a reducer n times consecutively. However, for each iteration, the mapper of each key-value pair computes a value that depends on n. from mrjob.job import mrjob class MRWord(mrjob): def…
Pippi
  • 2,451
  • 8
  • 39
  • 59
0
votes
1 answer

Hadoop: Output file has double output

I am running a Hadoop program and have the following as my input file, input.txt: 1 2 mapper.py: import sys for line in sys.stdin: print line, print "Test" reducer.py: import sys for line in sys.stdin: print line, When I run it without…
Objc55
  • 156
  • 1
  • 5
  • 18
0
votes
1 answer

Hadoop mapper compress output doesn't work?

I am using hadoop cdh4.1.2, and my mapper program is almost a echo of input data. But in my job status page, I saw FILE: Number of bytes written 3,040,552,298,327 is almost equals to FILE: Number of bytes read 3,363,917,397,416 for mappers,…
Shawn
  • 1,441
  • 4
  • 22
  • 36
0
votes
2 answers

Wordcount: More than 1 map task per block, with speculative execution off

In Wordcount, it appears that you can get More than 1 map task per block, with speculative execution off. Does the jobtracker do some magic under the hood to distribute tasks more than provided by the InputSplits?
jayunit100
  • 17,388
  • 22
  • 92
  • 167
0
votes
1 answer

How to run my MapReduce program in Eclipse

I have a hadoop environment on server, now I develop on my local PC, I have written a MapReduce Class (overwrite Mapper Class only) in Eclipse, and set the corresponding configuration in a main method, now I want to run my program in Eclipse, but I…
mashroom
  • 13
  • 6