I have a file of data and my task is to use map reduce to create a new data from each line of the file because the data is huge in the file. ex: the file contains: expression (3 -4 *7-4) and I need to create a new expression randomly from this expression (3+4/7*4). When I implement the task using map reduce I use map to do the change, and reduce to just to receive data from mapper and sort them Is it correct to use just map to do the main task?
Asked
Active
Viewed 963 times
0
-
If you want to `sort` them, you must use the reducer. The map phase only transforms your data. But from your description, looks you don't need to sort the result? – zsxwing Mar 05 '14 at 01:47
-
Does this answer your question? [How to write 'map only' hadoop jobs?](https://stackoverflow.com/questions/9394409/how-to-write-map-only-hadoop-jobs) – Vassopoli Oct 16 '22 at 13:02
2 Answers
0
Your implementation is correct. Just make sure the keys output from the mapper are all unique if you don't want any expressions that happen to be identical being combined.
For example, since you said you have a huge data file, there may be a possibility that you get two expressions such as 3-4*7-4
and 3*4/7+4
and both new expressions turn out to be 3+4*7-4
. If you use the expression as the key, the reducer will only get called once for both expressions. If you don't want this to happen, make sure you use a unique number for each key.

LeonardBlunderbuss
- 1,264
- 1
- 11
- 22
0
If you do not need sorting of map results - you set 0 reduced, ( by doing
job.setNumReduceTasks(0);
in your driver code ) and the job is called map only.

Chirag
- 1,478
- 16
- 20