0

I was creating data pipeline for dynamo db export to s3. The template given for serverless yaml is not working on "PAY_PER_REQUEST" billing mode

Created one using aws console itr worked fine, exported its definition, tried to create using same definition in serverless but it is giving me following error

ServerlessError: An error occurred: UrlReportDataPipeline - Pipeline Definition failed to validate because of following Errors: [{ObjectId = 'TableBackupActivity', errors = [Object references invalid id: 's3://dynamodb-dpl-#{myDDBRegion}/emr-ddb-storage-handler/4.11.0/emr-dynamodb-tools-4.11.0-SNAPSHOT-jar-with-dependencies.jar,org.apache.hadoop.dynamodb.tools.DynamoDBExport,#{output.directoryPath},#{input.tableName},#{input.readThroughputPercent}']}] and Warnings: [].

Can anyone help me on this. Pipeline created using console is working perfectly with same value of step in table backup activity.

Pipeline template is pasted below

UrlReportDataPipeline:
      Type: AWS::DataPipeline::Pipeline
      Properties: 
        Name: ***pipeline name****
        Activate: true
        ParameterObjects: 
          - Id: "myDDBReadThroughputRatio"
            Attributes: 
              - Key: "description"
                StringValue: "DynamoDB read throughput ratio"
              - Key: "type"
                StringValue: "Double"
              - Key: "default"
                StringValue: "0.9"
          - Id: "myOutputS3Loc"
            Attributes: 
              - Key: "description"
                StringValue: "S3 output bucket"
              - Key: "type"
                StringValue: "AWS::S3::ObjectKey"
              - Key: "default"
                StringValue: 
                  !Join [ "", [ "s3://", Ref: "UrlReportBucket" ] ]
          - Id: "myDDBTableName"
            Attributes: 
              - Key: "description"
                StringValue: "DynamoDB Table Name"
              - Key: "type"
                StringValue: "String"
          - Id: "myDDBRegion"
            Attributes:
              - Key: "description"
                StringValue: "DynamoDB region"
        ParameterValues: 
          - Id: "myDDBTableName"
            StringValue: 
              Ref: "UrlReport"
          - Id: "myDDBRegion"
            StringValue: "eu-west-1"
        PipelineObjects: 
          - Id: "S3BackupLocation"
            Name: "Copy data to this S3 location"
            Fields: 
              - Key: "type"
                StringValue: "S3DataNode"
              - Key: "dataFormat"
                RefValue: "DDBExportFormat"
              - Key: "directoryPath"
                StringValue: "#{myOutputS3Loc}/#{format(@scheduledStartTime, 'YYYY-MM-dd-HH-mm-ss')}"
          - Id: "DDBSourceTable"
            Name: "DDBSourceTable"
            Fields: 
              - Key: "tableName"
                StringValue: "#{myDDBTableName}"
              - Key: "type"
                StringValue: "DynamoDBDataNode"
              - Key: "dataFormat"
                RefValue: "DDBExportFormat"
              - Key: "readThroughputPercent"
                StringValue: "#{myDDBReadThroughputRatio}"
          - Id: "DDBExportFormat"
            Name: "DDBExportFormat"
            Fields: 
              - Key: "type"
                StringValue: "DynamoDBExportDataFormat"
          - Id: "TableBackupActivity"
            Name: "TableBackupActivity"
            Fields: 
              - Key: "resizeClusterBeforeRunning"
                StringValue: "true"
              - Key: "type"
                StringValue: "EmrActivity"
              - Key: "input"
                RefValue: "DDBSourceTable"
              - Key: "runsOn"
                RefValue: "EmrClusterForBackup"
              - Key: "output"
                RefValue: "S3BackupLocation"
              - Key: "step"
                RefValue: "s3://dynamodb-dpl-#{myDDBRegion}/emr-ddb-storage-handler/4.11.0/emr-dynamodb-tools-4.11.0-SNAPSHOT-jar-with-dependencies.jar,org.apache.hadoop.dynamodb.tools.DynamoDBExport,#{output.directoryPath},#{input.tableName},#{input.readThroughputPercent}"
          - Id: "DefaultSchedule"
            Name: "Every 1 day"
            Fields: 
              - Key: "occurrences"
                StringValue: "1"
              - Key: "startDateTime"
                StringValue: "2020-09-17T1:00:00"
              - Key: "type"
                StringValue: "Schedule"
              - Key: "period"
                StringValue: "1 Day"
          - Id: "Default"
            Name: "Default"
            Fields: 
              - Key: "type"
                StringValue: "Default"
              - Key: "scheduleType"
                StringValue: "cron"
              - Key: "failureAndRerunMode"
                StringValue: "CASCADE"
              - Key: "role"
                StringValue: "DatapipelineDefaultRole"
              - Key: "resourceRole"
                StringValue: "DatapipelineDefaultResourceRole"
              - Key: "schedule"
                RefValue: "DefaultSchedule"
          - Id: "EmrClusterForBackup"
            Name: "EmrClusterForBackup"
            Fields: 
              - Key: "terminateAfter"
                StringValue: "2 Hours"
              - Key: "masterInstanceType"
                StringValue: "m3.xlarge"
              - Key: "coreInstanceType"
                StringValue: "m3.xlarge"
              - Key: "coreInstanceCount"
                StringValue: "1"
              - Key: "type"
                StringValue: "EmrCluster"
              - Key: "releaseLabel"
                StringValue: "emr-5.23.0"
              - Key: "region"
                StringValue: "#{myDDBRegion}"
Asfar Irshad
  • 658
  • 1
  • 4
  • 20
  • When you have #{myOutputS3Loc} in your code is that to reference an environment variable or something ? with serverless I have had to use $ in place of where you have #. Could you try hardcoding in what you want instead of using this format just to eliminate that being any issue – AnonymousAlias Sep 17 '20 at 18:10
  • #{} will be replaced on runtime by data pipeline, i also tried hard-coding these values but no luck – Asfar Irshad Sep 20 '20 at 07:57
  • It looks like for "step" you have multiple values configured for refValue, is this correct? – AnonymousAlias Sep 21 '20 at 08:59

2 Answers2

1

Guys I solved it with AWS support team. As of Today, following is the yaml code which creates a data-pipleine on on-demand pay-per-request dynamodb tables

You can also convert this to json if you want

    UrlReportBucket:
      Type: AWS::S3::Bucket
      Properties:
        BucketName: ***bucketname***

    UrlReportDataPipeline:
      Type: AWS::DataPipeline::Pipeline
      Properties: 
        Name: ***pipelinename***
        Activate: true
        ParameterObjects: 
          - Id: "myDDBReadThroughputRatio"
            Attributes: 
              - Key: "description"
                StringValue: "DynamoDB read throughput ratio"
              - Key: "type"
                StringValue: "Double"
              - Key: "default"
                StringValue: "0.9"
          - Id: "myOutputS3Loc"
            Attributes: 
              - Key: "description"
                StringValue: "S3 output bucket"
              - Key: "type"
                StringValue: "AWS::S3::ObjectKey"
              - Key: "default"
                StringValue: 
                  !Join [ "", [ "s3://", Ref: "UrlReportBucket" ] ]
          - Id: "myDDBTableName"
            Attributes: 
              - Key: "description"
                StringValue: "DynamoDB Table Name"
              - Key: "type"
                StringValue: "String"
          - Id: "myDDBRegion"
            Attributes:
              - Key: "description"
                StringValue: "DynamoDB region"
        ParameterValues: 
          - Id: "myDDBTableName"
            StringValue: 
              Ref: "UrlReport"
          - Id: "myDDBRegion"
            StringValue: "eu-west-1"
        PipelineObjects: 
          - Id: "S3BackupLocation"
            Name: "Copy data to this S3 location"
            Fields: 
              - Key: "type"
                StringValue: "S3DataNode"
              - Key: "dataFormat"
                RefValue: "DDBExportFormat"
              - Key: "directoryPath"
                StringValue: "#{myOutputS3Loc}/#{format(@scheduledStartTime, 'YYYY-MM-dd-HH-mm-ss')}"
          - Id: "DDBSourceTable"
            Name: "DDBSourceTable"
            Fields: 
              - Key: "tableName"
                StringValue: "#{myDDBTableName}"
              - Key: "type"
                StringValue: "DynamoDBDataNode"
              - Key: "dataFormat"
                RefValue: "DDBExportFormat"
              - Key: "readThroughputPercent"
                StringValue: "#{myDDBReadThroughputRatio}"
          - Id: "DDBExportFormat"
            Name: "DDBExportFormat"
            Fields: 
              - Key: "type"
                StringValue: "DynamoDBExportDataFormat"
          - Id: "TableBackupActivity"
            Name: "TableBackupActivity"
            Fields: 
              - Key: "resizeClusterBeforeRunning"
                StringValue: "true"
              - Key: "type"
                StringValue: "EmrActivity"
              - Key: "input"
                RefValue: "DDBSourceTable"
              - Key: "runsOn"
                RefValue: "EmrClusterForBackup"
              - Key: "output"
                RefValue: "S3BackupLocation"
              - Key: "step"
                StringValue: "s3://dynamodb-dpl-#{myDDBRegion}/emr-ddb-storage-handler/4.11.0/emr-dynamodb-tools-4.11.0-SNAPSHOT-jar-with-dependencies.jar,org.apache.hadoop.dynamodb.tools.DynamoDBExport,#{output.directoryPath},#{myDDBTableName},#{myDDBReadThroughputRatio}"
          - Id: "DefaultSchedule"
            Name: "Every 1 day"
            Fields: 
              - Key: "occurrences"
                StringValue: "1"
              - Key: "startDateTime"
                StringValue: "2020-09-23T1:00:00"
              - Key: "type"
                StringValue: "Schedule"
              - Key: "period"
                StringValue: "1 Day"
          - Id: "Default"
            Name: "Default"
            Fields: 
              - Key: "type"
                StringValue: "Default"
              - Key: "scheduleType"
                StringValue: "cron"
              - Key: "failureAndRerunMode"
                StringValue: "CASCADE"
              - Key: "role"
                StringValue: "DatapipelineDefaultRole"
              - Key: "resourceRole"
                StringValue: "DatapipelineDefaultResourceRole"
              - Key: "schedule"
                RefValue: "DefaultSchedule"
          - Id: "EmrClusterForBackup"
            Name: "EmrClusterForBackup"
            Fields: 
              - Key: "terminateAfter"
                StringValue: "2 Hours"
              - Key: "masterInstanceType"
                StringValue: "m3.xlarge"
              - Key: "coreInstanceType"
                StringValue: "m3.xlarge"
              - Key: "coreInstanceCount"
                StringValue: "1"
              - Key: "type"
                StringValue: "EmrCluster"
              - Key: "releaseLabel"
                StringValue: "emr-5.23.0"
              - Key: "region"
                StringValue: "#{myDDBRegion}"
Asfar Irshad
  • 658
  • 1
  • 4
  • 20
0

Step has a refValue that points to multiple resources and also looks like they are specified as a string. According to serverless documentation a refValue is

A field value that you specify as an identifier of another object in the same pipeline definition.

If you look where you use S3BackupLocation it is created under PipelineObjects and then referenced using its Id.

For step you have refValue using a string for it's value, that string then has commas so it looks like it is specifying multiple objects.

I am not sure what step is meant to be but if you want to use refValue create it somewhere else in the template and use it's ID here?

Could also try using string value here instead of ref value

AnonymousAlias
  • 1,149
  • 2
  • 27
  • 68
  • that refValues are pipeline runtime identifiers. Suppose if I supply value of #{output.directoryPath} it creates a new s3 bucket every time, emr activity runs. – Asfar Irshad Sep 29 '20 at 10:36
  • Yes but you had a refValue for step where you should of had a string value like I mentioned in my answer. I see you have made this change in your answer. – AnonymousAlias Sep 29 '20 at 22:57