2

I am trying to learn mapreduce program using python mrjob. I am getting following error:

Traceback:

dumping stdin to local file /tmp/pyes_mrjob.testuser.20131004.103251.998597/STDIN
Making directory hdfs:///user/testuser/tmp/mrjob/pyes_mrjob.user.20131004.103251.998597/files/ on HDFS
> /usr/lib/hadoop-mapreduce/bin/hadoop fs -mkdir hdfs:///user/testuser/tmp/mrjob/pyes_mrjob.testuser.20131004.103251.998597/files/
Traceback (most recent call last):
  File "pyes_mrjob.py", line 34, in <module>
    MRWordFrequencyCount.run()
  File "/usr/local/lib/python2.7/dist-packages/mrjob/job.py", line 500, in run
    mr_job.execute()
  File "/usr/local/lib/python2.7/dist-packages/mrjob/job.py", line 518, in execute
    super(MRJob, self).execute()
  File "/usr/local/lib/python2.7/dist-packages/mrjob/launch.py", line 146, in execute
    self.run_job()
  File "/usr/local/lib/python2.7/dist-packages/mrjob/launch.py", line 207, in run_job
    runner.run()
  File "/usr/local/lib/python2.7/dist-packages/mrjob/runner.py", line 458, in run
    self._run()
  File "/usr/local/lib/python2.7/dist-packages/mrjob/hadoop.py", line 236, in _run
    self._upload_local_files_to_hdfs()
  File "/usr/local/lib/python2.7/dist-packages/mrjob/hadoop.py", line 263, in _upload_local_files_to_hdfs
    self._mkdir_on_hdfs(self._upload_mgr.prefix)
  File "/usr/local/lib/python2.7/dist-packages/mrjob/hadoop.py", line 271, in _mkdir_on_hdfs
    self.invoke_hadoop(['fs', '-mkdir', path])
  File "/usr/local/lib/python2.7/dist-packages/mrjob/fs/hadoop.py", line 81, in invoke_hadoop
    proc = Popen(args, stdout=PIPE, stderr=PIPE)
  File "/usr/lib/python2.7/subprocess.py", line 679, in __init__
    errread, errwrite)
  File "/usr/lib/python2.7/subprocess.py", line 1249, in _execute_child
    raise child_exception
OSError: [Errno 2] No such file or directory

I executed the command manually its working fine there but when i try to execute my program its not working. Since just started learning can someone suggest what library i have to choose. According to some blogs somelibraries has good documention and some libraries has better perfomance and .... I came across below post which looks older http://blog.cloudera.com/blog/2013/01/a-guide-to-python-frameworks-for-hadoop/

But so many libraries got updates recently. So can some suggest me library i can start with..

user2695817
  • 121
  • 1
  • 7

2 Answers2

5

i guess this problem is caused by the way how mrjob calls "hadoop fs -mkdir", if the parent dir of the targeted dir you want to make doesn't exist, -mkdir will fail. that means you have to use "hadoop fs -mkdir -p [path]". Ultimately, you will need to modify mrjob library manually in [path of mrjob install](mine is /usr/lib/python2.6/site-packages/mrjob)/hadoop.py at line 271:

self.invoke_hadoop(['fs', '-mkdir', path])

to

self.invoke_hadoop(['fs', '-mkdir', '-p', path])

Good Luck!

zhutoulala
  • 4,792
  • 2
  • 21
  • 34
1

It looks like you set your HADOOP_HOME to "/usr/lib/hadoop-mapreduce". However, this is wrong and it should be set to "/usr/lib/hadoop".

Also, if you get an error saying that the hadoop-streaming.jar could not be found, create a symlink in "/usr/lib/hadoop" to this jar as follows:

    sudo ln -s /usr/lib/hadoop-mapreduce/hadoop-streaming.jar /usr/lib/hadoop
Simikolon
  • 317
  • 3
  • 9