I have had problems trying to use org.apache.hadoop.mapred.lib.IdentityMapper as the argument of -mapper in Hadoop Streaming 1.0.3. "cat" works though; does using cat affect performance -- especially on Elastic MapReduce?
Asked
Active
Viewed 240 times
2
-
I think there's no huge difference in performance, except `cat` commands relies on cluster of *nix slaves when java realization is common. – morsik Jul 30 '14 at 13:59
-
Any task (map, reduce) is executing on tasktrackers (or container in yarn). I've meant that you could use cat command only on *nix servers. Please, post your code for clarifying question, how you run you Job? – morsik Aug 07 '14 at 11:28
1 Answers
0
I faced a similar issue, where identity mapper didn't work and I'd have to use Cat
.
We did not see a huge change in performance and as I know identity mapper is a jar vs cat is unix command.

Bowdzone
- 3,827
- 11
- 39
- 52

sundeep veeramachaneni
- 33
- 1
- 3