Mapreduce without reducer function

Question

I have a file of data and my task is to use map reduce to create a new data from each line of the file because the data is huge in the file. ex: the file contains: expression (3 -4 *7-4) and I need to create a new expression randomly from this expression (3+4/7*4). When I implement the task using map reduce I use map to do the change, and reduce to just to receive data from mapper and sort them Is it correct to use just map to do the main task?

If you want to `sort` them, you must use the reducer. The map phase only transforms your data. But from your description, looks you don't need to sort the result? — zsxwing, Mar 05 '14 at 01:47
Does this answer your question? [How to write 'map only' hadoop jobs?](https://stackoverflow.com/questions/9394409/how-to-write-map-only-hadoop-jobs) — Vassopoli, Oct 16 '22 at 13:02

score 0 · Answer 1 · answered Mar 05 '14 at 02:03

Your implementation is correct. Just make sure the keys output from the mapper are all unique if you don't want any expressions that happen to be identical being combined.

For example, since you said you have a huge data file, there may be a possibility that you get two expressions such as 3-4*7-4 and 3*4/7+4 and both new expressions turn out to be 3+4*7-4. If you use the expression as the key, the reducer will only get called once for both expressions. If you don't want this to happen, make sure you use a unique number for each key.

score 0 · Accepted Answer · answered Mar 05 '14 at 05:50

0

If you do not need sorting of map results - you set 0 reduced, ( by doing

job.setNumReduceTasks(0);

in your driver code ) and the job is called map only.

answered Mar 05 '14 at 05:50

Chirag

1,478
16
20

Mapreduce without reducer function

2 Answers2