2

I have had problems trying to use org.apache.hadoop.mapred.lib.IdentityMapper as the argument of -mapper in Hadoop Streaming 1.0.3. "cat" works though; does using cat affect performance -- especially on Elastic MapReduce?

verve
  • 775
  • 1
  • 9
  • 21
  • I think there's no huge difference in performance, except `cat` commands relies on cluster of *nix slaves when java realization is common. – morsik Jul 30 '14 at 13:59
  • Any task (map, reduce) is executing on tasktrackers (or container in yarn). I've meant that you could use cat command only on *nix servers. Please, post your code for clarifying question, how you run you Job? – morsik Aug 07 '14 at 11:28

1 Answers1

0

I faced a similar issue, where identity mapper didn't work and I'd have to use Cat.

We did not see a huge change in performance and as I know identity mapper is a jar vs cat is unix command.

Bowdzone
  • 3,827
  • 11
  • 39
  • 52