I have such a job:
from mrjob.job import MRJob
from mrjob.step import MRStep
import urllib
import re
import httpagentparser
UA_STRING = re.compile(MYSUPERCOMPLEXREGEX)
class MRReferralAnalysis(MRJob):
def mapper(self, _, line):
for group in UA_STRING.findall(line):
ua = httpagentparser.simple_detect(group)
yield (ua, 1)
def reducer(self, itemOfInterest, counts):
yield (sum(counts), itemOfInterest)
def steps(self):
return [
MRStep( mapper=self.mapper,
reducer=self.reducer)
]
if __name__ == '__main__':
MRReferralAnalysis.run()
Now I want to call this mrjob program multiple times (about two dozen times), with varying parameters that are fetched from another file and passed into my MYSUPERCOMPLEXREGEX. Is that even possible with mrJob and how to schedule the tasks? Or write a wrapper program that triggers the jobs?