Python Hadoop mrjob: subprocess.CalledProcessError: Command returned non-zero exit status 1

Question

I'm using package mrjob on Python3.7 recently. I started hadoop and created an wordaccount.py file, which can calculate the frequency of each word in an .txt file. When I tried to run the file through python3 wordaccount.py -r hadoop data/hamlet.txt>1.txt, here comes some problems:

xjj@master:/usr/local/hadoop/pyhadoop$ python3 wordaccount.py -r hadoop data/hamlet.txt>1.txt
No configs found; falling back on auto-configuration
No configs specified for hadoop runner
Looking for hadoop binary in /usr/local/hadoop/bin...
Found hadoop binary: /usr/local/hadoop/bin/hadoop
Using Hadoop version 2.7.1
Looking for Hadoop streaming jar in /usr/local/hadoop...
Found Hadoop streaming jar: /usr/local/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.7.1.jar
Creating temp directory /tmp/wordaccount.xjj.20220522.085604.327723
uploading working dir files to hdfs:///user/xjj/tmp/mrjob/wordaccount.xjj.20220522.085604.327723/files/wd...
STDERR: 22/05/22 16:56:07 WARN hdfs.DFSClient: DataStreamer Exception
STDERR: org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /user/xjj/tmp/mrjob/wordaccount.xjj.20220522.085604.327723/files/wd/mrjob.zip._COPYING_ could only be replicated to 0 nodes instead of minReplication (=1).  There are 0 datanode(s) running and no node(s) are excluded in this operation.
STDERR:     at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1550)
STDERR:     at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getNewBlockTargets(FSNamesystem.java:3110)
STDERR:     at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3034)
STDERR:     at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:723)
STDERR:     at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:492)
STDERR:     at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
STDERR:     at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
STDERR:     at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
STDERR:     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
STDERR:     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
STDERR:     at java.security.AccessController.doPrivileged(Native Method)
STDERR:     at javax.security.auth.Subject.doAs(Subject.java:422)
STDERR:     at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
STDERR:     at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
STDERR: 
STDERR:     at org.apache.hadoop.ipc.Client.call(Client.java:1476)
STDERR:     at org.apache.hadoop.ipc.Client.call(Client.java:1407)
STDERR:     at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
STDERR:     at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
STDERR:     at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:418)
STDERR:     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
STDERR:     at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
STDERR:     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
STDERR:     at java.lang.reflect.Method.invoke(Method.java:498)
STDERR:     at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
STDERR:     at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
STDERR:     at com.sun.proxy.$Proxy10.addBlock(Unknown Source)
STDERR:     at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1430)
STDERR:     at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1226)
STDERR:     at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:449)
STDERR: put: File /user/xjj/tmp/mrjob/wordaccount.xjj.20220522.085604.327723/files/wd/mrjob.zip._COPYING_ could only be replicated to 0 nodes instead of minReplication (=1).  There are 0 datanode(s) running and no node(s) are excluded in this operation.
Traceback (most recent call last):
  File "wordaccount.py", line 19, in <module>
    WordCount.run()
  File "/home/xjj/anaconda3/lib/python3.7/site-packages/mrjob/job.py", line 616, in run
    cls().execute()
  File "/home/xjj/anaconda3/lib/python3.7/site-packages/mrjob/job.py", line 687, in execute
    self.run_job()
  File "/home/xjj/anaconda3/lib/python3.7/site-packages/mrjob/job.py", line 636, in run_job
    runner.run()
  File "/home/xjj/anaconda3/lib/python3.7/site-packages/mrjob/runner.py", line 503, in run
    self._run()
  File "/home/xjj/anaconda3/lib/python3.7/site-packages/mrjob/hadoop.py", line 328, in _run
    self._upload_local_files()
  File "/home/xjj/anaconda3/lib/python3.7/site-packages/mrjob/runner.py", line 1156, in _upload_local_files
    self._copy_files_to_wd_mirror()
  File "/home/xjj/anaconda3/lib/python3.7/site-packages/mrjob/runner.py", line 1257, in _copy_files_to_wd_mirror
    self._copy_file_to_wd_mirror(path, name)
  File "/home/xjj/anaconda3/lib/python3.7/site-packages/mrjob/runner.py", line 1238, in _copy_file_to_wd_mirror
    self.fs.put(path, dest)
  File "/home/xjj/anaconda3/lib/python3.7/site-packages/mrjob/fs/composite.py", line 151, in put
    return self._handle('put', path, src, path)
  File "/home/xjj/anaconda3/lib/python3.7/site-packages/mrjob/fs/composite.py", line 110, in _handle
    return getattr(fs, name)(*args, **kwargs)
  File "/home/xjj/anaconda3/lib/python3.7/site-packages/mrjob/fs/hadoop.py", line 321, in put
    self.invoke_hadoop(['fs', '-put', src, path])
  File "/home/xjj/anaconda3/lib/python3.7/site-packages/mrjob/fs/hadoop.py", line 183, in invoke_hadoop
    raise CalledProcessError(proc.returncode, args)
subprocess.CalledProcessError: Command '['/usr/local/hadoop/bin/hadoop', 'fs', '-put', '/tmp/wordaccount.xjj.20220522.085604.327723/mrjob.zip', 'hdfs:///user/xjj/tmp/mrjob/wordaccount.xjj.20220522.085604.327723/files/wd/mrjob.zip']' returned non-zero exit status 1.

The content in wordaccount.py is as following. The function of it is calculating the frequency of every word occurred.

import os
import sys
from mrjob.job import MRJob
from mrjob.step import MRStep
from mrjob.protocol import RawValueProtocol,JSONProtocol,ReprProtocol
import traceback
class WordCount(MRJob):
    def mapper(self, _, line):
        linearry = line.split()
        for word in linearry:
            yield word, 1
    def reducer(self, key, value):
        yield key,sum(value)
if __name__ == '__main__':
    WordCount.run()

I'm sure I have started hadoop through sbin/start-all.sh, and wordaccount.py and hamlet.txt actually exists. According to the Trackback, I have found the hadoop.py. So what changes should I make? Thanks.

_There are 0 datanode(s) running_ clearly indicates that either you dont have hadoop config in your pyhton path, or hadoop is not running altogether. _No configs specified for hadoop runner_ probably means its the former. — mazaneicha, May 22 '22 at 18:32
Hadoop 2.7.1 is rather old, by the way. You may want to upgrade, but this is unrelated to the error you're getting — OneCricketeer, May 23 '22 at 17:04

Python Hadoop mrjob: subprocess.CalledProcessError: Command returned non-zero exit status 1

0 Answers0