Questions tagged [amazon-data-pipeline]

Simple service to transfer data between Amazon data storage services, kick off Elastic MapReduce jobs, and connect with outside data services.

From the AWS Data Pipeline homepage:

AWS Data Pipeline is a web service that helps you reliably process and move data between different AWS compute and storage services as well as on-premise data sources at specified intervals. With AWS Data Pipeline, you can regularly access your data where it’s stored, transform and process it at scale, and efficiently transfer the results to AWS services such as Amazon S3, Amazon RDS, Amazon DynamoDB, and Amazon Elastic MapReduce (EMR).

AWS Data Pipeline helps you easily create complex data processing workloads that are fault tolerant, repeatable, and highly available. You don’t have to worry about ensuring resource availability, managing inter-task dependencies, retrying transient failures or timeouts in individual tasks, or creating a failure notification system. AWS Data Pipeline also allows you to move and process data that was previously locked up in on-premise data silos.

470 questions

votes

1 answer

How to pass EMR cluster ID in the CloudWatch Alarm

I am trying to create a SNS alarm for my EMR cluster so when EMR cluster is failed i should get notified . But my issue is i am not able to pass Cluster ID as JobFlowId in the CloudWatch Alarm . I am create all resources using CloudFomartion Templet…

asked Jul 19 '18 at 07:13

Atharv Thakur

votes

3 answers

Set 'maxActiveInstances' error

I am using AWS data-pipeline to export a DDB table, but when I activate I get an error: Web service limit exceeded: Exceeded number of concurrent executions. Please set the field 'maxActiveInstances' to a higher value in your pipeline or…

amazon-web-services amazon-data-pipeline

asked Apr 27 '18 at 09:17

goelakash

2,502
4
40
56

votes

0 answers

Issue with copying data from s3 to Redshift

I am trying to sync a table from MySQL RDS to redshift trough data pipeline. There was no issue in copying data frm RDS to S3. But while copying S3 to redhsift the follwoing isue is seen. amazonaws.datapipeline.taskrunner.TaskExecutionException:…

amazon-web-services amazon-s3 amazon-redshift amazon-rds amazon-data-pipeline

asked Feb 28 '18 at 13:20

Coder 477

votes

1 answer

Load props file in EMR Spark Application

I am trying to load custom properties in my spark application using…

apache-spark emr amazon-data-pipeline

asked Nov 28 '17 at 15:39

Sanchay

1,053
1
16
33

votes

3 answers

AWS Data Pipeline Pricing for On-demand Runs

AWS Data pipeline documentation provides following information on pricing for data pipelines. High frequency activities - $1.00 per month Low frequency activities - $0.60 per month Inactive pipelines - $1.00 per month High Frequency activities are…

amazon-web-services amazon-data-pipeline

asked Oct 26 '17 at 09:54

user_default

votes

1 answer

AWS Data Pipeline between RDS Instances (MySQL)

Is it possible to build a data pipeline in AWS to transfer data between two different RDS MySQL instances? The transfer would be taking place once per day (although not necessarily at the same time every day). I am interested in copying full…

amazon-rds amazon-data-pipeline

asked Sep 26 '17 at 22:29

speedyturkey

votes

2 answers

Unable to establish connection to jdbc:mysql communication link failure

I've been trying to set up a data pipeline between an S3 bucket and an Elasitcbeanstalk environment which includes a MySQL RDS instance (all in the same VPC). I get the failure: The last packet sent successfully to the server was 0 milliseconds…

amazon-rds amazon-data-pipeline

asked Sep 19 '17 at 23:24

kan

votes

1 answer

DynamoDB export as gzipped JSON

I exported a DynamoDB table using an AWS Data Pipeline with DataNodes > S3BackupLocation > Compression set to GZIP. I expected compressed output with a .gz extension, but I got uncompressed output with no extension. Further reading reveals that the…

amazon-web-services amazon-s3 hive amazon-dynamodb amazon-data-pipeline

asked Sep 02 '17 at 19:48

Qaz

1,556
2
20
34

votes

1 answer

How to change memory settings for Hive Activity running in AWS data pipeline?

While running one Hive Activity using AWS Data Pipeline, my Hive activity is failing with following error: Diagnostics: Container [pid=,containerID=] is running beyond physical memory limits. Current usage: 1.0 GB of 1 GB physical memory used; 2.8…

amazon-web-services hadoop hive amazon-emr amazon-data-pipeline

asked Aug 07 '17 at 11:46

Shekhar

11,438
36
130
186

votes

1 answer

How to configure AWS data pipeline using serverless.yml?

I am new to both data pipeline and serverless. I want to know how can I automate AWS data pipeline using serverless. Below is my diagram of AWS data pipeline which exports dynamo db table to S3

amazon-web-services amazon-data-pipeline serverless-framework data-pipeline

asked Jul 16 '17 at 05:15

deosha

votes

2 answers

Select data from multiple tables in RDS using DataPipeline

Is there a way (using existing templates) to select data from multiple tables by joining them using AWS datapipeline. My usecase requires me to combine data from multiple RDS tables to export to Redshift. For eg. RDS has Tables School, Student,…

amazon-web-services amazon-rds amazon-data-pipeline

asked Jun 21 '17 at 16:43

Adi

votes

1 answer

AWS Data Pipeline EmrClusterForLoad Error

I'm try to transfer data between s3 and dynamodb with AWSDataPipeline. error message below... Unable to create resource for @EmrClusterForLoad_2017-05-15T18:51:19 due to: The supplied ami version is invalid. (Service: AmazonElasticMapReduce;…

amazon-web-services amazon-s3 amazon-dynamodb amazon-data-pipeline

asked May 15 '17 at 23:11

Beauspiring

votes

1 answer

AWS Data Pipeline Backup RDS PSQL Data to S3

I tried using AWS DATA Pipeline to transfer data from PSQL to S3, however my activities are failing due to memory issue. I am getting Java heap space error What are the multiple solution through which I can transfer data from psql table ( 25 Gb ) to…

amazon-web-services amazon-s3 amazon-rds amazon-data-pipeline

asked May 06 '17 at 11:57

Ronak Agrawal

votes

1 answer

How do you run an EmrActivity on an existing EMR cluster?

Is there a way to run an EmrActivity in AWS Data Pipeline on an existing cluster? We currently are using Data Pipeline to run jobs in AWS EMR using EmrCluster and EmrActivity but we'd like to have all pipelines run on the same cluster. I've tried…

amazon-data-pipeline

asked May 04 '17 at 23:41

Mark J Miller

4,751
5
44
74

votes

1 answer

AWS Data Pipeline Error

The dynamoDB table backup using data pipeline aws process got error as: 02 May 2017 07:19:04,544 [WARN] (TaskRunnerService-df-0940986HJGYQM1ZJ8BN_@EmrClusterForBackup_2017-04-25T13:31:55-2) df-0940986HJGYQM1ZJ8BN…

amazon-web-services amazon-data-pipeline

asked May 02 '17 at 13:57

Akhil N

Prev 1 2 3

…

31 32 Next