Questions tagged [amazon-data-pipeline]

Simple service to transfer data between Amazon data storage services, kick off Elastic MapReduce jobs, and connect with outside data services.

From the AWS Data Pipeline homepage:

AWS Data Pipeline is a web service that helps you reliably process and move data between different AWS compute and storage services as well as on-premise data sources at specified intervals. With AWS Data Pipeline, you can regularly access your data where it’s stored, transform and process it at scale, and efficiently transfer the results to AWS services such as Amazon S3, Amazon RDS, Amazon DynamoDB, and Amazon Elastic MapReduce (EMR).

AWS Data Pipeline helps you easily create complex data processing workloads that are fault tolerant, repeatable, and highly available. You don’t have to worry about ensuring resource availability, managing inter-task dependencies, retrying transient failures or timeouts in individual tasks, or creating a failure notification system. AWS Data Pipeline also allows you to move and process data that was previously locked up in on-premise data silos.

470 questions
2
votes
1 answer

How to pass EMR cluster ID in the CloudWatch Alarm

I am trying to create a SNS alarm for my EMR cluster so when EMR cluster is failed i should get notified . But my issue is i am not able to pass Cluster ID as JobFlowId in the CloudWatch Alarm . I am create all resources using CloudFomartion Templet…
2
votes
3 answers

Set 'maxActiveInstances' error

I am using AWS data-pipeline to export a DDB table, but when I activate I get an error: Web service limit exceeded: Exceeded number of concurrent executions. Please set the field 'maxActiveInstances' to a higher value in your pipeline or…
goelakash
  • 2,502
  • 4
  • 40
  • 56
2
votes
0 answers

Issue with copying data from s3 to Redshift

I am trying to sync a table from MySQL RDS to redshift trough data pipeline. There was no issue in copying data frm RDS to S3. But while copying S3 to redhsift the follwoing isue is seen. amazonaws.datapipeline.taskrunner.TaskExecutionException:…
2
votes
1 answer

Load props file in EMR Spark Application

I am trying to load custom properties in my spark application using…
Sanchay
  • 1,053
  • 1
  • 16
  • 33
2
votes
3 answers

AWS Data Pipeline Pricing for On-demand Runs

AWS Data pipeline documentation provides following information on pricing for data pipelines. High frequency activities - $1.00 per month Low frequency activities - $0.60 per month Inactive pipelines - $1.00 per month High Frequency activities are…
user_default
  • 396
  • 5
  • 19
2
votes
1 answer

AWS Data Pipeline between RDS Instances (MySQL)

Is it possible to build a data pipeline in AWS to transfer data between two different RDS MySQL instances? The transfer would be taking place once per day (although not necessarily at the same time every day). I am interested in copying full…
speedyturkey
  • 126
  • 9
2
votes
2 answers

Unable to establish connection to jdbc:mysql communication link failure

I've been trying to set up a data pipeline between an S3 bucket and an Elasitcbeanstalk environment which includes a MySQL RDS instance (all in the same VPC). I get the failure: The last packet sent successfully to the server was 0 milliseconds…
kan
  • 23
  • 1
  • 1
  • 6
2
votes
1 answer

DynamoDB export as gzipped JSON

I exported a DynamoDB table using an AWS Data Pipeline with DataNodes > S3BackupLocation > Compression set to GZIP. I expected compressed output with a .gz extension, but I got uncompressed output with no extension. Further reading reveals that the…
2
votes
1 answer

How to change memory settings for Hive Activity running in AWS data pipeline?

While running one Hive Activity using AWS Data Pipeline, my Hive activity is failing with following error: Diagnostics: Container [pid=,containerID=] is running beyond physical memory limits. Current usage: 1.0 GB of 1 GB physical memory used; 2.8…
Shekhar
  • 11,438
  • 36
  • 130
  • 186
2
votes
1 answer

How to configure AWS data pipeline using serverless.yml?

I am new to both data pipeline and serverless. I want to know how can I automate AWS data pipeline using serverless. Below is my diagram of AWS data pipeline which exports dynamo db table to S3
2
votes
2 answers

Select data from multiple tables in RDS using DataPipeline

Is there a way (using existing templates) to select data from multiple tables by joining them using AWS datapipeline. My usecase requires me to combine data from multiple RDS tables to export to Redshift. For eg. RDS has Tables School, Student,…
Adi
  • 387
  • 3
  • 6
  • 14
2
votes
1 answer

AWS Data Pipeline EmrClusterForLoad Error

I'm try to transfer data between s3 and dynamodb with AWSDataPipeline. error message below... Unable to create resource for @EmrClusterForLoad_2017-05-15T18:51:19 due to: The supplied ami version is invalid. (Service: AmazonElasticMapReduce;…
2
votes
1 answer

AWS Data Pipeline Backup RDS PSQL Data to S3

I tried using AWS DATA Pipeline to transfer data from PSQL to S3, however my activities are failing due to memory issue. I am getting Java heap space error What are the multiple solution through which I can transfer data from psql table ( 25 Gb ) to…
2
votes
1 answer

How do you run an EmrActivity on an existing EMR cluster?

Is there a way to run an EmrActivity in AWS Data Pipeline on an existing cluster? We currently are using Data Pipeline to run jobs in AWS EMR using EmrCluster and EmrActivity but we'd like to have all pipelines run on the same cluster. I've tried…
Mark J Miller
  • 4,751
  • 5
  • 44
  • 74
2
votes
1 answer

AWS Data Pipeline Error

The dynamoDB table backup using data pipeline aws process got error as: 02 May 2017 07:19:04,544 [WARN] (TaskRunnerService-df-0940986HJGYQM1ZJ8BN_@EmrClusterForBackup_2017-04-25T13:31:55-2) df-0940986HJGYQM1ZJ8BN…
Akhil N
  • 351
  • 1
  • 3
  • 4