Questions tagged [amazon-data-pipeline]

Simple service to transfer data between Amazon data storage services, kick off Elastic MapReduce jobs, and connect with outside data services.

From the AWS Data Pipeline homepage:

AWS Data Pipeline is a web service that helps you reliably process and move data between different AWS compute and storage services as well as on-premise data sources at specified intervals. With AWS Data Pipeline, you can regularly access your data where it’s stored, transform and process it at scale, and efficiently transfer the results to AWS services such as Amazon S3, Amazon RDS, Amazon DynamoDB, and Amazon Elastic MapReduce (EMR).

AWS Data Pipeline helps you easily create complex data processing workloads that are fault tolerant, repeatable, and highly available. You don’t have to worry about ensuring resource availability, managing inter-task dependencies, retrying transient failures or timeouts in individual tasks, or creating a failure notification system. AWS Data Pipeline also allows you to move and process data that was previously locked up in on-premise data silos.

470 questions

votes

1 answer

Multiple inputs for EmrActivity

According to Data Pipeline documentation the EMRActivity Step command uses a different format than a regular EMR Job. Here is a simplified…

hadoop amazon-s3 amazon-data-pipeline

asked Nov 21 '13 at 00:55

Gabriel Burete

votes

1 answer

Amazon Data Pipeline: When does ShellCommandActivity start the On Fail Action?

How does the AWS Pipeline determine if a ShellCommandActivity fails or not and when it starts the corresponding on Fail action? Can I write code in the script which checks if the actions where done correctly and then "tells" the AWS Pipeline that…

amazon-web-services amazon-data-pipeline

asked Nov 14 '13 at 10:55

Biffy

votes

2 answers

Automating Hive Activity using aws

I would like to automate my hive script every day , in order to do that i have an option which is data pipeline. But the problem is there that i am exporting data from dynamo-db to s3 and with a hive script i am manipulating this data. I am giving…

hadoop amazon-web-services hive amazon-data-pipeline

asked Oct 31 '13 at 14:48

Ducaz035

3,054
2
25
45

votes

0 answers

How to connect AWS RDS for SQL Server to ODBC data sources via Linked Server connections?

Setup Currently we are using SQL Server installed on an EC2 instance as our central data warehouse. We pull in data from a long list of data sources. This is done via SQL Agent Jobs that execute Stored Procedures querying the data sources. The…

sql-server amazon-web-services odbc amazon-rds amazon-data-pipeline

asked Oct 06 '22 at 23:01

DSC

votes

1 answer

AWS Data Pipeline Dynamo to Redshift

I have an issue: I need to migrate data from DynamoDB to Redshift. The problem is that I receive such exception: ERROR: Unsupported Data Type: Current Version only supports Strings and Numbers Detail: -----------------------------------------------…

amazon-web-services amazon-dynamodb amazon-redshift amazon-data-pipeline

asked Dec 07 '20 at 21:05

4d61726b

votes

1 answer

AWS data pipeline name tag option for EC2 resource

I'm running a shell activity in EC2 resource sample json for creating EC2 resource. { "id" : "MyEC2Resource", "type" : "Ec2Resource", "actionOnTaskFailure" : "terminate", "actionOnResourceFailure" : "retryAll", "maximumRetries" : "1", …

amazon-web-services amazon-ec2 amazon-data-pipeline aws-data-pipeline

asked Jun 03 '20 at 12:22

Dev

votes

1 answer

How to catch Spark error from shell script

I have a pipeline in AWS Data Pipeline that runs a shell script named shell.sh: $ spark-submit transform_json.py Running command on cluster... [54.144.10.162] Running command... [52.206.87.30] Running command... [54.144.10.162] Command…

amazon-web-services apache-spark amazon-data-pipeline

asked May 26 '20 at 22:25

gogolaygo

votes

1 answer

Data Pipeline (DynamoDB to S3) - How to format S3 file?

I have a Data Pipeline that exports my DynamoDB table to an S3 bucket so I can use the S3 file for services like QuickSight, Athena and Forecast. However, for my S3 file to work with these services, I need the file to be formatted in a csv like…

amazon-web-services amazon-s3 amazon-dynamodb amazon-data-pipeline

asked May 25 '20 at 10:04

incnnu

votes

1 answer

Data Pipeline & EMR error: No default VPC found. But I'm not authorized to create default VPC

I need to export a DynamoDB table to an S3 bucket. I've created a Data Pipeline, but it's stuck in Waiting for runner status so I checked the runsOn value and it says "EmrClusterForBackup". Then I checked EMR and for the cluster…

amazon-web-services amazon-emr amazon-vpc amazon-data-pipeline

asked May 23 '20 at 14:17

incnnu

votes

4 answers

'm3.xlarge' is not supported in AWS Data Pipeline

I am new to AWS, trying to run an AWS DATA Pipeline by loading data from DynamoDB to S3. But i am getting below error. Please help Unable to create resource for @EmrClusterForBackup_2020-05-01T14:18:47 due to: Instance type 'm3.xlarge' is not…

amazon-web-services amazon-data-pipeline

asked May 01 '20 at 14:37

NikRED

1,175
2
21
39

votes

1 answer

AWS Data Pipeline can't validate S3 Access [permission warning]

I am doing an evaluation of AWS database services to pick the most effective one, the objective is to load data from a json file from an S3 bucket into Redshift every 5 minutes. I am currently trying to use AWS Data Pipeline for the automation of…

amazon-web-services amazon-s3 amazon-redshift amazon-iam amazon-data-pipeline

asked May 29 '19 at 08:19

shini

votes

1 answer

DynamoDB data loading is too slow; not respecting Provisioned Write Capacity. DynamoDB poor performance on data load

I have exported and transformed 340 million rows from DynamoDB into S3. I am now trying to import them back into DynamoDB using the Data Pipeline. I have my table write provisioning set to 5600 capacity units and I can't seem to get the pipeline to…

amazon-dynamodb amazon-data-pipeline

asked Mar 08 '19 at 00:28

Garet Jax

1,091
3
17
37

votes

0 answers

AWS Datapipeline Canceled Task Exception thrown after running for 5 days

I have been trying to run an AWS datapipeline that calls a bash process that calls several long running python and java processes from a shell command activity. Each time the shell command activity runs, a reportProgress error is thrown in the Task…

amazon-web-services amazon-data-pipeline

asked Dec 04 '18 at 06:18

Matthew

votes

2 answers

Export data from a MariaDB RDS table to S3 - Data Pipeline failing

My goal is to export a large (~300GB) table to a csv/tsv in S3 for long term storage (basically, if someone WANTS to look at it in years to come, they can, but it is not required to be available online). I need to copy JUST THIS ONE TABLE, not the…

amazon-web-services amazon-s3 amazon-rds amazon-data-pipeline

asked Oct 08 '18 at 12:14

zaitsman

8,984
6
47
79

votes

0 answers

Import from S3 to MySQL RDS and create tables for aws?

I am new to aws rds. Now I am trying to import several csv files to MySQL RDS with Data Pipeline service. I chose the so-called Load S3 data into RDS MySQL table template. But the parameters is where I have a headache. My data is in my…

amazon-web-services amazon-s3 amazon-rds amazon-data-pipeline

asked Sep 29 '18 at 14:48

JP Zhang

Prev 1 2 3

…

31 32 Next