I am trying to get a simple PigActivity to work in Data Pipeline. http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-object-pigactivity.html#pigactivity
The Input and Output fields are required for this activity. I have them both set to use S3DataNode. Both of these DataNodes have a directoryPath which point to my s3 input and output. I originally tried to use filePath but got the following error:
PigActivity requires 'directoryPath' in 'Output' object.
I am using a custom pig script, also located in S3.
My question is how do I reference these input and output paths in my script?
The example given on the reference uses the stage field (which can be disabled/enabled). My understanding is that this used to convert the data into tables. I don't want to do this as it also requires that you specify a dataFormat field.
Determines whether staging is enabled and allows your Pig script to have access to the staged-data tables, such as ${INPUT1} and ${OUTPUT1}.
I have disabled staging and I am trying to access the data in my script as follows:
input = LOAD '$Input';
But I get the following error:
IOException. org.apache.pig.tools.parameters.ParameterSubstitutionException: Undefined parameter : Input
I have tried using:
input = LOAD '${Input}';
But I get an error for this too.
There is the optional scriptVariable field. Do I have to use some sort of mapping here?