Questions tagged [amazon-data-pipeline]

Simple service to transfer data between Amazon data storage services, kick off Elastic MapReduce jobs, and connect with outside data services.

From the AWS Data Pipeline homepage:

AWS Data Pipeline is a web service that helps you reliably process and move data between different AWS compute and storage services as well as on-premise data sources at specified intervals. With AWS Data Pipeline, you can regularly access your data where it’s stored, transform and process it at scale, and efficiently transfer the results to AWS services such as Amazon S3, Amazon RDS, Amazon DynamoDB, and Amazon Elastic MapReduce (EMR).

AWS Data Pipeline helps you easily create complex data processing workloads that are fault tolerant, repeatable, and highly available. You don’t have to worry about ensuring resource availability, managing inter-task dependencies, retrying transient failures or timeouts in individual tasks, or creating a failure notification system. AWS Data Pipeline also allows you to move and process data that was previously locked up in on-premise data silos.

470 questions
0
votes
1 answer

AWS Data Pipeline - SQLActivity - update statement possible?

I need to build a data pipeline which takes input from a CSV file (stored on S3) and "updates" records in the Aurora RDS table. I understand the standard format (out of the box template) for bulk record insertion, but for the records update or…
Atul
  • 125
  • 1
  • 2
  • 6
0
votes
1 answer

Task runner is not running on my local machine

I am running task runner to perform the defined task, while running it am getting exception telling that can't upload log files to s3. After debugging the task runner application I found that, it will use ACL option to upload task runner log files…
0
votes
1 answer

AWS Datapipeline - import randomly named files in a s3 bucket to Redshift

I have a use case where new files can show up in a s3 folder at any time and we would like to import them in Redshift via the RedshiftCopyActivity. I have a pipeline setup where we can move data from s3 to Redshift - but with files that are…
sumit
  • 436
  • 3
  • 15
0
votes
2 answers

Using AWS Data Pipeline PigActivity

I am trying to get a simple PigActivity to work in Data Pipeline. http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-object-pigactivity.html#pigactivity The Input and Output fields are required for this activity. I have them both set…
0
votes
2 answers

AWS DataPipeline: RedshiftCopyActivity OVERWRITE_EXISTING not enforcing primary key

I have a DataPipeline that exports data from a local DB to Redshift via S3 (very similar to Incremental copy of RDS MySQL table to Redshift template). I have defined primary key and set insertMode to "OVERWRITE_EXISTING" in pipeline definition,…
0
votes
2 answers

DynamoDB backup via AWS Data Pipeline and EMR

We are trying to get backups of a DynamoDB table to S3 via AWS Data Pipeline. We are using the default template for this, provided by AWS (http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-importexport-ddb-part2.html). However, the…
0
votes
1 answer

How to run sub tree under aws data pipeline

Is it possible to run subpart of the whole dependency tree under aws data pipeline. As to what i could understand there is no way to do that. Either one has to rerun the entire pipeline or just some single sql activity. Which becomes difficult to…
Siddhant Jain
  • 489
  • 5
  • 26
0
votes
1 answer

AWS Data Pipeline: setting local variable in shell command

I am trying to make use of the uuid library within a shell command invoked by an AWS data pipeline. It seems like the uuid function works fine, but when I try to pass this value to a variable, the data is lost. A snippet of my testing script…
0
votes
1 answer

How can I attach an EBS Volume to an EMR Cluster using the AWS Data Pipeline?

AWS have recently provided the possibility to attach an EBS volume to specific cluster instance types like m4's. Whilst it is possible to attach an EBS volume using EMR, I cannot seem to find a way to do so via the AWS Data Pipeline. Am I missing…
dyltini
  • 543
  • 7
  • 11
0
votes
1 answer

Storing error message to Redshift through datapipeline

I am trying to run a SQL activity in Redshift cluster through data pipeline. After SQL activity, few logs need be written to a Table in redshift [such as number of rows affected, the error message(if any)]. Requirement: If the sql Activity is…
0
votes
1 answer

Redshift SqlActivity : How to reference input and output in script

I have a Datapipeline where I'm using a Redshift SqlActivity that read from a Redshift table and write in another Redshift table. I would like to know if it is possible to reference the input and output field from the SqlActivity e.g INSERT INTO…
Guillaume Mercey
  • 391
  • 3
  • 14
0
votes
1 answer

SQLActivity: Invalid Role for 'DataPipelineDefaultRole' w. full access

I want to run a SQL query using AWS data pipeline. I have read the SQL activity info on their support page. I am getting the error message: Object:DefaultSqlActivity1 WARNING: Invalid role: 'DataPipelineDefaultRole'. Please confirm AWS IAM Role…
0
votes
1 answer

How do you transform DynamoDB Map/List types to through AWS Pipeline to Redshift?

I have two DynamoDB tables, one with a Map data type (JSON) and one with a List data type (list of JSON). Our current pipeline to Redshift claims these are unsupported data types. How can I transform these columns to Redshift as varchar(MAX)?
0
votes
1 answer

Internal error message for AWSDataPipeline

I activated my pipeline that has a simple RDS to S3 copy activity, running on a t1.micro instance, and it failed with the message “Unable to create resource…due to an internal error”. Please refer to the screenshot: (Screenshot of failure…
0
votes
1 answer

ShellCommandActivity in AWS Data Pipeline

I am transferring Dynamo DB data to S3 using Data Pipeline. In the S3 bucket I get the backup but it is split into multiple files. To get the data in a single file I used a Shell Command Activity which runs the following command: aws s3 cat…