How to integrate data with python code before running python program on command line

Question

I have downloaded movielens dataset from that hyperlink ml-100k.zip (it is a movie and user information dataset and it is in the older dataset tab) and i have write the simple MapReduce code like below;

from mrjob.job import MrJob

class MoviesByUserCounter(MRJob):
    def mapper(self , key ,line):
        (userID,movieID,rating,timestamp)=line.split('\t')
        yield userID,movieID

    def reducer(self , user , movies):
        numMovies=0
        for movie in movies:
            numMovies=numMovies+1

         yield user,numMovies

if __name__=='__main__':
    MoviesByUserCounter.run()

I use python 3.5.3 version and PyCharm community edition as a python ide.

I have tried on the command line

python my_code.py

but it doesn't work as i expected actually it works but it waits not response anyhow . it has been running for a while actually it is still going on.it writes on the command line only:

Running step 1 of 1...
reading from STDIN

How could i give the data(u.data : it is the data file that in the ml-100k.zip) in my python program code on command line successfully?If there are any other solutions , it will be great too.

Thanks in advance.

https://pythonhosted.org/mrjob/guides/quickstart.html#running-your-job-different-ways — Stop harming Monica, Jul 18 '17 at 12:06
Thanks a lot Goyo but before i asked , i have tried them too . it didn't work again. — pcpcne, Jul 18 '17 at 12:18
You could look at the [argparse module](https://pypi.python.org/pypi/argparse) — Professor_Joykill, Jul 18 '17 at 12:40

score 1 · Accepted Answer · answered Jul 18 '17 at 13:47

If I am not mistaken, you want to give your data as a command line argument.

You would want to do this using sys.argv. Barring that, look at a CLI (Command Line Interface) library.

Example:

import sys

def main(arg1, arg2, *kwargs)
    #do something
if __name__ == "__main__":
    #there are not enough args
    if len(sys.argv) < 3:
        raise SyntaxError("Too few arguments.")
    if len(sys.argv) != 3:
        # There are keyword arguments
        main(sys.argv[1], sys.argv[2], *sys.argv[3:])
    else:
        # no keyword args.
        main(sys.argv[1], sys.argv[2])

In this way, you can pass arguments that are location dependant, like normal python positional arguments, for the first two and keyword arguments in the form a=1.

Example use:

Passing the data file as first argument and a parameter as the second

python my_code.py data.zip 0.1

If you will be using more than a few command line parameters, you will want to spend time with a CLI library so that they are no longer location dependant.

How to integrate data with python code before running python program on command line

1 Answers1