I would like to write mapreduce code - ideally using python - on my apple mac to streaming it on a hadoop sandbox (e.g. Hortonworks or Cloudera).
Ideally my development setup is using my Apple Mac python environment & an hadoop VM sandbox (later a cluster on the same network).
While there are many description on how to connect or stream code from within a node of the hadoop cluster (e.g. from the NameNode etc.), I am unclear on what to do from outside of the cluster.
E.g. I assume I need to install some hadoop client libraries? Where do I get these libraries from?
How do I install them?
What type of python package works best?
What IP address should I use to stream my python code?
Any help - and any link to a tutorial covering this - would be great!