0

I have the following program:

from mrjob.job import MRJob
from mrjob.step import MRStep

class RatingsBreakdown(MRJob):
    def steps(self):
        return [
                MRStep(mapper=self.mapper_get_ratings,
                       reducer=self.reducer_count_ratings)
                ]

    def mapper_get_ratings(self, _, line):
        (userID, movieID, rating, timestamp) = line.split('\t')
        yield rating, 1


    def reducer_count_ratings(self, key, values):
        yield key, sum(values)


if __name__ == '__main__':
    RatingsBreakdown.run()

and I am trying to run it on Ubuntu 18.04 with:

sudo python3 RatingsBreakdown.py -r hadoop --hadoop-bin /usr/local/hadoop/bin/hadoop u.data

where u.data is the data source.

The programs stops and I keep getting the following error:

OSError: Could not mkdir 
hdfs:///user/root/tmp/mrjob/RatingsBreakdown.root.20191110.010957.606661/files/wd

When I try running the mkdir command manually I get:

mkdir: Incomplete HDFS URI, no host: hdfs:///user/root/tmp/mrjob/RatingsBreakdown.root.20191110.010957.606661/files/w

I need to mention that I have a functional Hadoop installation (it works with Java-based programs) and the Python environment is also set well. If I don't use the hadoop runner the program executes correctly. It seems that there's an interaction problem between Python (MRJob) and Hadoop.

I searched and searched but can't seem to find anything helpful. Please help me! Thanks

calin.bule
  • 95
  • 1
  • 15
  • Can you create it if you use this format? hdfs://localhost:9000/ – oppressionslayer Nov 10 '19 at 02:09
  • I just tried `hadoop fs -mkdir -p hdfs://localhost:9000/user/root/tmp/mrjob/RatingsBreakdown.root.20191110.010957.606661/files/w` and I get a Connection refused message. Does not work – calin.bule Nov 10 '19 at 08:33
  • are your sure the service is running, or maybe running into a firewall issue? – oppressionslayer Nov 10 '19 at 08:44
  • I added a new rule on the firewall for the port 9000 and now I get: `mkdir: Call From ubuntu/127.0.1.1 to localhost:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused` – calin.bule Nov 10 '19 at 20:19

0 Answers0