I have a text document full of lines of tweets that I need to run a MapReduce job on. I am using Python and MRJob to do so with the following code:
from mrjob.job import MRJob
import re
import datetime
class exerciseOne(MRJob):
def mapper(self, _, line):
fields = line.split(";")
epochtemp = int(fields[0])
difference = epochtemp/1000.0
key = datetime.datetime.fromtimestamp(difference).strftime('%Y-%m-%d')
yield(key, 1)
if __name__ == '__main__':
exerciseOne.run()
A small sample of the text that needs to be analysed is contained here: https://textuploader.com/dnx59 if anyone is interested.
The issue I am having is I don't know how to iterate through the lines in the mapper method to generate all the key-value pairs. I have tried the following:
for line in lines
and
while(line)
but neither have worked for the dataset I am using. How can I correctly loop through these?