I'm trying to make use of an external library in my Python mapper script in an AWS Elastic MapReduce job.
However, my script doesn't seem to be able to find the modules in the cache. I archived the files into a tarball called helper_classes.tar
and uploaded the tarball to an Amazon S3 bucket. When creating my MapReduce job on the console, I specified the argument as:
cacheArchive s3://folder1/folder2/helper_classes.tar#helper_classes
At the beginning of my Python mapper script, I included the following code to import the library:
import sys
sys.path.append('./helper_classes')
import geoip.database
When I run the MapReduce job, it fails with an ImportError: No module named geoip.database.
(geoip
is a folder in the top level of helper_classes.tar
and database is the module I'm trying to import.)
Any ideas what I could be doing wrong?