5

I am using AWS Data Pipeline to save a text file to my S3 bucket from RDS. I would like the file name to to have the date and the hour in the file name like:

myfile-YYYYMMDD-HH.txt
myfile-20140813-12.txt

I have specified my S3DataNode FilePath as:

s3://mybucketname/out/myfile-#{format(myDateTime,'YYYY-MM-dd-HH')}.txt

When I try to save my pipeline I get the following error:

ERROR: Unable to resolve myDateTime for object:DataNodeId_xOQxz

According to the AWS Data Pipeline documentation for date and time functions this is the proper syntax for using the format function.

When I save pipeline using a "hard-coded" the date and time I don't get this error and my file is in my S3 bucket and folder as expected.

My thinking is that I need to define "myDateTime" somewhere or use a NOW()

Can somebody tell me how to set "myDateTime" to the current time (e.g. NOW) or give a workaround so I can format the current time to be used in my FilePath?

davedi
  • 53
  • 1
  • 3

2 Answers2

7

I am not aware of an exact equivalent of NOW() in Data Pipeline. I tried using makeDate with no arguments (just for fun) to see if that worked.. it did not.

The closest are runtime variables scheduledStartTime, actualStartTime, reportProgressTime.

http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-object-s3datanode.html

The following for eg. should work. s3://mybucketname/out/myfile-#{format(@scheduledStartTime,'YYYY-MM-dd-HH')}.txt

user1452132
  • 1,758
  • 11
  • 21
  • 2
    +1 Thank you. Yes, using **@scheduledStartTime** instead of "myDateTime" did the trick. I also just read that _a [user-defined field](https://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-writing-pipeline-definition.html#dp-userdefined-fields) must have a name prefixed with the word "my" in all lower-case letters, followed by a capital letter or underscore character._ I also found other variables/[objects for the CopyActivity](http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-object-copyactivity.html) – davedi Aug 14 '14 at 18:28
  • and how to subtract a date here in this format because I am unable to do so – Nikhil Parmar Nov 28 '17 at 19:34
1

Just for fun, here is some more info on Parameters.

At the end of your Pipeline Json (click List Pipelines, select into one, click Edit Pipeline, then click Export), you need to add a Parameters and/or Values object.

I use a myStartDate for backfill processes which you can manipulate once it is passed in for ad hoc runs. You can give this a static default, but can't set it to a dynamic value so it is limited for regular schedule tasks. For realtime/scheduled dates, you need to use the @scheduledStartTime, etc, as suggested. Here is a sample of setting up some Parameters and or Values. Both show up in Parameters in the UI. These values can be used through out your pipeline activities (shell, hive, etc) with the #{myVariableToUse} notation.

"parameters": [
{
  "helpText": "Put help text here",
  "watermark": "This shows if no default or value set",
  "description": "Label/Desc",
  "id": "myVariableToUse",
  "type": "string"
}
]

And for Values:

"values": {
  "myS3OutLocation": "s3://some-bucket/path",
  "myThreshold": "30000",
}

You cannot add these directly in the UI (yet) but once they are there you can change and save the values.

williambq
  • 1,125
  • 7
  • 12