I am using Hadoop streaming, I start the script as following:
../hadoop/bin/hadoop jar ../hadoop/contrib/streaming/hadoop-streaming-1.0.4.jar \
-mapper ../tests/mapper.php \
-reducer ../tests/reducer.php \
-input data \
-output out
"data" is 2.5 GB txt file.
however in ps axf I can see only one mapper. i tried with -Dmapred.map.tasks=10, but result is the same - single mapper.
how can I make hadoop split my input file and start several mapper processes?