I've created a Amazon EMR job using mrjob. My mapreduce job inherits from a common helper class to make my parsing of the apache log I'm parsing easier, the class I inherit from is shared amongst several mapreduce jobs, so this is my file structure:
__init__.py
count_ip.py (mapreduce job)
common/apache.py (base class count_ip.py inherits from)
I'd like to automatically tar my full src directory from my local machine and have mrjob upload it to Amazon EMR. Right now I have a tar file with the common directory, common.tar.gz . This tar I've added to my python packages in the mrjob.conf, it works fine, what I'd like to do is to automatically createthe common.tar.gz, is there any support for mrjob to handle this and if not, what options do I have?