0

I have such a job:

from mrjob.job import MRJob
from mrjob.step import MRStep
import urllib
import re
import httpagentparser

UA_STRING = re.compile(MYSUPERCOMPLEXREGEX)

class MRReferralAnalysis(MRJob):

    def mapper(self, _, line):

        for group in UA_STRING.findall(line):

            ua = httpagentparser.simple_detect(group)
            yield (ua, 1)

    def reducer(self, itemOfInterest, counts):

        yield (sum(counts), itemOfInterest)

    def steps(self):
        return [
            MRStep( mapper=self.mapper,
                    reducer=self.reducer)
        ]

if __name__ == '__main__':
    MRReferralAnalysis.run()

Now I want to call this mrjob program multiple times (about two dozen times), with varying parameters that are fetched from another file and passed into my MYSUPERCOMPLEXREGEX. Is that even possible with mrJob and how to schedule the tasks? Or write a wrapper program that triggers the jobs?

Stephan Kristyn
  • 15,015
  • 14
  • 88
  • 147

1 Answers1

0

Wrap the MRReferralAnalysis.run() call in a loop, and read in your configuration immediately before the loop. The job will then run multiple times.

Ben Watson
  • 5,357
  • 4
  • 42
  • 65