1

I am running MapReduce jobs on Hive and most of the code already resides in a git repo. I know I am able to include instructions in the bootstrap script when spawning up clusters, but is it possible to do all these things:

  • Adjust the python path in the bash_profile for the functions in the repo
  • Pulling the git repo and as part of the Hive scripts, all the scripts in the repo?

For the second point, how would I reference the script that is in the git repo from my hive script, like a sample one below:

FROM (
MAP 
table.values
USING 
'python script_from_repo.py' 
AS params
FROM 
big_table
) ..........;

Really appreciate any help.

intl
  • 2,753
  • 9
  • 45
  • 71
  • Why not pull it from an S3 bucket (which EMR script-runner supports) - and synchronize S3 location with git hub during bootstrap? Would that work? – user1452132 Nov 26 '15 at 14:05
  • Possibly. I could do a git clone as well, but the issue would still be the bash profile mod I'd have to make to make the imports work for python. – intl Nov 27 '15 at 07:31

0 Answers0