The problem I'm encountering is this:
Having already put my input.txt
(50MBytes) file into HDFS, I'm running
python ./test.py hdfs:///user/myself/input.txt -r hadoop --hadoop-bin /usr/bin/hadoop
It seems that MrJob spends a lot of time copying files to hdfs (again?)
Copying local files into hdfs:///user/myself/tmp/mrjob/test.myself.20150927.104821.148929/files/
Is this logical? Shouldn't it use input.txt
directly from HDFS?
(Using Hadoop version 2.6.0)