I have downloaded movielens dataset from that hyperlink ml-100k.zip (it is a movie and user information dataset and it is in the older dataset tab) and i have write the simple MapReduce code like below;
from mrjob.job import MrJob
class MoviesByUserCounter(MRJob):
def mapper(self , key ,line):
(userID,movieID,rating,timestamp)=line.split('\t')
yield userID,movieID
def reducer(self , user , movies):
numMovies=0
for movie in movies:
numMovies=numMovies+1
yield user,numMovies
if __name__=='__main__':
MoviesByUserCounter.run()
I use python 3.5.3 version and PyCharm community edition as a python ide.
I have tried on the command line
python my_code.py
but it doesn't work as i expected actually it works but it waits not response anyhow . it has been running for a while actually it is still going on.it writes on the command line only:
Running step 1 of 1...
reading from STDIN
How could i give the data(u.data : it is the data file that in the ml-100k.zip) in my python program code on command line successfully?If there are any other solutions , it will be great too.
Thanks in advance.