-1

I have downloaded movielens dataset from that hyperlink ml-100k.zip (it is a movie and user information dataset and it is in the older dataset tab) and i have write the simple MapReduce code like below;

from mrjob.job import MrJob

class MoviesByUserCounter(MRJob):
    def mapper(self , key ,line):
        (userID,movieID,rating,timestamp)=line.split('\t')
        yield userID,movieID

    def reducer(self , user , movies):
        numMovies=0
        for movie in movies:
            numMovies=numMovies+1

         yield user,numMovies

if __name__=='__main__':
    MoviesByUserCounter.run()

I use python 3.5.3 version and PyCharm community edition as a python ide.

I have tried on the command line

python my_code.py 

but it doesn't work as i expected actually it works but it waits not response anyhow . it has been running for a while actually it is still going on.it writes on the command line only:

Running step 1 of 1...
reading from STDIN

How could i give the data(u.data : it is the data file that in the ml-100k.zip) in my python program code on command line successfully?If there are any other solutions , it will be great too.

Thanks in advance.

John Vandenberg
  • 474
  • 6
  • 16
pcpcne
  • 43
  • 2
  • 11

1 Answers1

1

If I am not mistaken, you want to give your data as a command line argument.

You would want to do this using sys.argv. Barring that, look at a CLI (Command Line Interface) library.

Example:

import sys

def main(arg1, arg2, *kwargs)
    #do something
if __name__ == "__main__":
    #there are not enough args
    if len(sys.argv) < 3:
        raise SyntaxError("Too few arguments.")
    if len(sys.argv) != 3:
        # There are keyword arguments
        main(sys.argv[1], sys.argv[2], *sys.argv[3:])
    else:
        # no keyword args.
        main(sys.argv[1], sys.argv[2])

In this way, you can pass arguments that are location dependant, like normal python positional arguments, for the first two and keyword arguments in the form a=1.

Example use:

Passing the data file as first argument and a parameter as the second

python my_code.py data.zip 0.1 

If you will be using more than a few command line parameters, you will want to spend time with a CLI library so that they are no longer location dependant.

Jeremy Barnes
  • 642
  • 1
  • 9
  • 24