3

I'm trying to run an hive script on AWS EMR using the php sdk. How can I pass the script parameters (like, input, output and dates to work on)?

Thanks

Gluz
  • 3,154
  • 5
  • 24
  • 35

2 Answers2

1

If you are struggling with this as well...

A sample code for passing variables to hive script can be found at the following Amazon Forum Thread

hailrok
  • 231
  • 1
  • 3
  • 6
1

I've done this with the Java SDK, using the PHP SDK essentially what you need to do is parse in the parameters you want with add_job_flow_steps function

You need to add the parameters to the StepConfig (for the script you are running) in the "Args" array when calling the function.

Args - string|array - Optional - A list of command line arguments passed to the JAR file’s main function when executed. Pass a string for a single value, or an indexed array for multiple values.

The format of the arguments is a bit confusing, you need to have an array of the form

("-d","yourVariable=itsValue","-d","anotherVariable=AnotherValue")

So it should end up looking a bit like this:

 add_job_flow_steps('j-19430859jg9',array( new CFStepConfig(array(
'Name' => 'Run a hive script',
'HadoopJarStep' => array( 'Jar' => CFHadoopStep::run_hive_script(),
'Args' => array("-d","yourVariable=itsValue","-d","anotherVariable=AnotherValue")
))))

I don't know if the syntax is quite right, I haven't tried it.

At least this is how it is for java, maybe for PHP you may need to have an associate array, I would try a variety of formats.

I expect this is so that these parameters are not confused with other hadoop/hive configuration parameters.

You can then access these variables in the script in a similar way to as in bash, using ${yourVariable}.

SELECT * FROM TABLE WHERE column='${yourVariable};
karthikr
  • 97,368
  • 26
  • 197
  • 188
danmbyrd
  • 86
  • 2