Questions tagged [amazon-data-pipeline]

Simple service to transfer data between Amazon data storage services, kick off Elastic MapReduce jobs, and connect with outside data services.

From the AWS Data Pipeline homepage:

AWS Data Pipeline is a web service that helps you reliably process and move data between different AWS compute and storage services as well as on-premise data sources at specified intervals. With AWS Data Pipeline, you can regularly access your data where it’s stored, transform and process it at scale, and efficiently transfer the results to AWS services such as Amazon S3, Amazon RDS, Amazon DynamoDB, and Amazon Elastic MapReduce (EMR).

AWS Data Pipeline helps you easily create complex data processing workloads that are fault tolerant, repeatable, and highly available. You don’t have to worry about ensuring resource availability, managing inter-task dependencies, retrying transient failures or timeouts in individual tasks, or creating a failure notification system. AWS Data Pipeline also allows you to move and process data that was previously locked up in on-premise data silos.

470 questions

votes

2 answers

AWS Data Pipeline Support for SQL Server RDS

I am trying to find documentation regarding the supported data source for AWS Data Pipeline. What I need to do is export SQL Server RDS data to S3. I am finding plenty of documentation saying that Data Pipeline can use RDS as a source but every…

amazon-redshift rds amazon-data-pipeline

asked Apr 01 '14 at 06:11

Brian Amersi

votes

2 answers

Amazon AWS: DataPipelineDefaultRole/EDPSession not authorized to perform iam:ListRolePolicies

I have been assigned an IAM role in AWS by my manager and I am trying to setup an Amazon Data Pipeline. I am repeatedly facing permission issues and authorization issues like the following when trying to activate the PipeLine. WARNING: Error…

amazon-web-services amazon-s3 amazon-rds amazon-iam amazon-data-pipeline

asked Mar 30 '14 at 10:00

Rakib

12,376
16
77
113

votes

1 answer

How to export DynamoDB table data without the point in time recovery?

I am trying to export data from a DynamoDB table for the last 15 days, but unfortunately, the point in time recovery is not active. So I can't use the new DynamoDB export to S3 feature because it's not retroactive. I have tried using the AWS Data…

amazon-dynamodb amazon-emr amazon-data-pipeline

asked Feb 07 '21 at 07:21

Reactive_learner

votes

1 answer

AWS datapipeline import from S3 bucket to a dynamoDB table that is in a different region gives error

When I try to use a data-pipeline to import to a dynamo db table that is in the same region as the data pipeline it works without error. When I modify the EMRClusterForLoad step to use a region that is different from the region that the…

amazon-web-services amazon-dynamodb amazon-data-pipeline

asked Oct 25 '19 at 00:20

coderunner3099

votes

1 answer

How to solve "DriverClass not found for database:mariadb" with AWS data pipeline?

I'm trying to play with AWS Data Pipelines (and then Glue later) and am following Copy MySQL Data Using the AWS Data Pipeline Console. However, when I execute the pipeline, I get DriverClass not found for database:mariadb I would expect this to…

amazon-rds amazon-data-pipeline

asked Sep 04 '18 at 19:12

Chris F

14,337
30
94
192

votes

0 answers

When using the Data Pipeline to backup a DynamoDB table, does readThroughputPercent account for autoscaling?

Suppose I set my DDB table to autoscale over 80%, and set the backup data pipeline to 0.85. Does the pipeline use the read throughput it determined initially, or does it scale up along with the table?

amazon-web-services amazon-dynamodb amazon-data-pipeline

asked Aug 20 '18 at 02:49

Jay

votes

1 answer

AWS Data Pipeline S3 to DynamoDB JSON Error

I'm trying to import a TSV file from S3 into DynamoDB using Data Pipelines, but I keep hitting a MalformedJsonException. I've validated both pieces of Json that I provide: the definition of the data pipeline and the manifest of the S3 folder, so…

amazon-web-services elastic-map-reduce amazon-data-pipeline

asked Jan 25 '18 at 16:17

tghw

25,208
13
70
96

votes

1 answer

What should be return value from ShellCommandPrecondition of AWS data pipeline?

I am writing one shell script which should get executed by ShellCommandPrecondition of AWS data pipeline. AWS documentation doesn' specify What should be the return value from the script? Can I just return 0 on success and 1 or any other value if…

amazon-web-services amazon-data-pipeline

asked Jan 12 '18 at 12:07

Shekhar

11,438
36
130
186

votes

1 answer

Data Pipeline failing for EMR Activity

I am trying to run a spark step on AWS Data-pipeline. I am getting the following exception:- amazonaws.datapipeline.taskrunner.TaskExecutionException: Failed to complete EMR transform. at …

apache-spark amazon-emr amazon-data-pipeline

asked Nov 20 '17 at 12:37

Sanchay

1,053
1
16
33

votes

1 answer

How to compute 'DynamoDB read throughput ratio' while setting up DataPipeline to export DynamoDB data to S3

I have a DynamoDB with ~16M records where each record is of size 4k. The table is configured for autoscaling Target utilization: 70%, Minimum provisioned capacity for Reads: 250 and Maximum provisioned capacity for Writes: 3000. I am trying to…

amazon-s3 amazon-dynamodb amazon-emr amazon-data-pipeline

asked Nov 11 '17 at 15:15

techNoob

votes

1 answer

How to setup google cloud storage correctly for spark application using aws data pipeline

I am setting up the cluster step to run a spark application using Amazon Data Pipeline. My job is to read data from S3, process the data and write data to google cloud storage. For google cloud storage, I am using the service account with key file.…

apache-spark google-cloud-storage google-cloud-dataproc amazon-data-pipeline spark-submit

asked Oct 27 '17 at 04:28

user1086102

votes

1 answer

EMR activity using data pipeline for spark job

I am trying to run a Jar file for spark job in data pipeline, but I am not sure what I exactly need to pass in EMR step?

emr amazon-data-pipeline

asked Aug 13 '17 at 23:33

Monika Patel

votes

1 answer

Is it possible to create EMR cluster with Auto scaling using Data pipeline

I am new to AWS. I have created a EMR cluster using Auto scaling policy through AWS console. I have also created a data pipeline which can use this cluster to perform the activities. I am also able to create EMR cluster dynamically through data…

amazon-web-services amazon-emr amazon-data-pipeline data-pipeline

asked Jul 31 '17 at 10:07

Bharani

votes

1 answer

Importing data from Excel sheet to DynamoDB table

I am having a problem importing data from Excel sheet to a Amazon DynamoDB table. I have the Excel sheet in an Amazon S3 bucket and I want to import data from this sheet to a table in DynamoDB. Currently I am following Import and Export DynamoDB…

amazon-web-services amazon-s3 amazon-dynamodb amazon-data-pipeline

asked Apr 06 '17 at 14:33

Ans Ahmad

votes

0 answers

How to mark aws data pipeline as FINISHED not ERROR when s3 precondition fails?

I've been struggling for a few weeks to find a config solution that works how I expect - maybe what I want isn't possible... Here's what I'm trying to do: Check an S3 bucket for any files If there aren't any, don't spin up a cluster just mark the…

amazon-s3 amazon-data-pipeline preconditions

asked Feb 21 '17 at 17:15

Dave Chapman

Prev 1 2 3

…

31 32 Next