0

I have a text document full of lines of tweets that I need to run a MapReduce job on. I am using Python and MRJob to do so with the following code:

from mrjob.job import MRJob
import re
import datetime

class exerciseOne(MRJob):

    def mapper(self, _, line):
        fields = line.split(";")
        epochtemp = int(fields[0])
        difference = epochtemp/1000.0
        key = datetime.datetime.fromtimestamp(difference).strftime('%Y-%m-%d')
        yield(key, 1)

if __name__ == '__main__':
    exerciseOne.run()

A small sample of the text that needs to be analysed is contained here: https://textuploader.com/dnx59 if anyone is interested.

The issue I am having is I don't know how to iterate through the lines in the mapper method to generate all the key-value pairs. I have tried the following:

for line in lines

and

while(line)

but neither have worked for the dataset I am using. How can I correctly loop through these?

halfer
  • 19,824
  • 17
  • 99
  • 186
faboys
  • 57
  • 1
  • 1
  • 8

1 Answers1

0

Not familiar with this library but I think the pattern you are looking for is this:

Instantiate class:

line_mapper = exerciseOne()
key_generator = line_mapper.mapper(None, text_blob)

Where text_blob is the block of text you've linked too. You'll then be able to iterate over the key_generator object using a for loop for example

Sven Harris
  • 2,884
  • 1
  • 10
  • 20