I have two Python scripts that are intended to run on Amazon Elastic MapReduce - one as a mapper and one as a reducer. I've just recently expanded the mapper script to require a couple more local models that I've created that both live in a package called SentimentAnalysis. What's the right way to have a Python script import from a local Python package on S3? I tried creating S3 keys that mimic my file system in hopes that the relative paths will work, but alas it didn't. Here's what I see in the log files on S3 after the step failed:
Traceback (most recent call last):
File "/mnt/var/lib/hadoop/mapred/taskTracker/hadoop/jobcache/job_201407250000_0001/attempt_201407250000_0001_m_000000_0/work/./sa_mapper.py", line 15, in <module>
from SentimentAnalysis import NB, LR
ImportError: No module named SentimentAnalysis
The relevant file structure is like this:
sa_mapper.py
sa_reducer.py
SentimentAnalysis/NB.py
SentimentAnalysis/LR.py
And the mapper.py has:
from SentimentAnalysis import NB, LR
I tried to mirror the file structure in S3, but that doesn't seem to work.
What's the best way to setup S3 or EMR so that sa_mapper.py can import NB.py and LR.py? Is there some special trick to doing this?