This is my hadoop job:
hadoop streaming \
-D mapred.map.tasks=1\
-D mapred.reduce.tasks=1\
-mapper "awk '{if(\$0<3)print}'" \ # doesn't work
-reducer "cat" \
-input "/user/***/input/" \
-output "/user/***/out/"
this job always fails, with an error saying:
sh: -c: line 0: syntax error near unexpected token `('
sh: -c: line 0: `export TMPDIR='..../work/tmp'; /bin/awk { if ($0 < 3) print } '
But if I change the -mapper
into this:
-mapper "awk '{print}'"
it works without any error. What's the problem with the if(..)
?
UPDATE:
Thank @paxdiablo for your detailed answer.
what I really want to do is filter out some data whose 1st column is greater than x
, before piping the input data to my custom bin
. So the -mapper
actually looks like this:
-mapper "awk -v x=$x{if($0<x)print} | ./bin"
Is there any other way to achieve that?