Questions tagged [amazon-data-pipeline]

Simple service to transfer data between Amazon data storage services, kick off Elastic MapReduce jobs, and connect with outside data services.

From the AWS Data Pipeline homepage:

AWS Data Pipeline is a web service that helps you reliably process and move data between different AWS compute and storage services as well as on-premise data sources at specified intervals. With AWS Data Pipeline, you can regularly access your data where it’s stored, transform and process it at scale, and efficiently transfer the results to AWS services such as Amazon S3, Amazon RDS, Amazon DynamoDB, and Amazon Elastic MapReduce (EMR).

AWS Data Pipeline helps you easily create complex data processing workloads that are fault tolerant, repeatable, and highly available. You don’t have to worry about ensuring resource availability, managing inter-task dependencies, retrying transient failures or timeouts in individual tasks, or creating a failure notification system. AWS Data Pipeline also allows you to move and process data that was previously locked up in on-premise data silos.

470 questions
3
votes
1 answer

Multiple inputs for EmrActivity

According to Data Pipeline documentation the EMRActivity Step command uses a different format than a regular EMR Job. Here is a simplified…
3
votes
1 answer

Amazon Data Pipeline: When does ShellCommandActivity start the On Fail Action?

How does the AWS Pipeline determine if a ShellCommandActivity fails or not and when it starts the corresponding on Fail action? Can I write code in the script which checks if the actions where done correctly and then "tells" the AWS Pipeline that…
Biffy
  • 871
  • 2
  • 10
  • 21
3
votes
2 answers

Automating Hive Activity using aws

I would like to automate my hive script every day , in order to do that i have an option which is data pipeline. But the problem is there that i am exporting data from dynamo-db to s3 and with a hive script i am manipulating this data. I am giving…
Ducaz035
  • 3,054
  • 2
  • 25
  • 45
2
votes
0 answers

How to connect AWS RDS for SQL Server to ODBC data sources via Linked Server connections?

Setup Currently we are using SQL Server installed on an EC2 instance as our central data warehouse. We pull in data from a long list of data sources. This is done via SQL Agent Jobs that execute Stored Procedures querying the data sources. The…
2
votes
1 answer

AWS Data Pipeline Dynamo to Redshift

I have an issue: I need to migrate data from DynamoDB to Redshift. The problem is that I receive such exception: ERROR: Unsupported Data Type: Current Version only supports Strings and Numbers Detail: -----------------------------------------------…
2
votes
1 answer

AWS data pipeline name tag option for EC2 resource

I'm running a shell activity in EC2 resource sample json for creating EC2 resource. { "id" : "MyEC2Resource", "type" : "Ec2Resource", "actionOnTaskFailure" : "terminate", "actionOnResourceFailure" : "retryAll", "maximumRetries" : "1", …
2
votes
1 answer

How to catch Spark error from shell script

I have a pipeline in AWS Data Pipeline that runs a shell script named shell.sh: $ spark-submit transform_json.py Running command on cluster... [54.144.10.162] Running command... [52.206.87.30] Running command... [54.144.10.162] Command…
2
votes
1 answer

Data Pipeline (DynamoDB to S3) - How to format S3 file?

I have a Data Pipeline that exports my DynamoDB table to an S3 bucket so I can use the S3 file for services like QuickSight, Athena and Forecast. However, for my S3 file to work with these services, I need the file to be formatted in a csv like…
2
votes
1 answer

Data Pipeline & EMR error: No default VPC found. But I'm not authorized to create default VPC

I need to export a DynamoDB table to an S3 bucket. I've created a Data Pipeline, but it's stuck in Waiting for runner status so I checked the runsOn value and it says "EmrClusterForBackup". Then I checked EMR and for the cluster…
2
votes
4 answers

'm3.xlarge' is not supported in AWS Data Pipeline

I am new to AWS, trying to run an AWS DATA Pipeline by loading data from DynamoDB to S3. But i am getting below error. Please help Unable to create resource for @EmrClusterForBackup_2020-05-01T14:18:47 due to: Instance type 'm3.xlarge' is not…
NikRED
  • 1,175
  • 2
  • 21
  • 39
2
votes
1 answer

AWS Data Pipeline can't validate S3 Access [permission warning]

I am doing an evaluation of AWS database services to pick the most effective one, the objective is to load data from a json file from an S3 bucket into Redshift every 5 minutes. I am currently trying to use AWS Data Pipeline for the automation of…
2
votes
1 answer

DynamoDB data loading is too slow; not respecting Provisioned Write Capacity. DynamoDB poor performance on data load

I have exported and transformed 340 million rows from DynamoDB into S3. I am now trying to import them back into DynamoDB using the Data Pipeline. I have my table write provisioning set to 5600 capacity units and I can't seem to get the pipeline to…
Garet Jax
  • 1,091
  • 3
  • 17
  • 37
2
votes
0 answers

AWS Datapipeline Canceled Task Exception thrown after running for 5 days

I have been trying to run an AWS datapipeline that calls a bash process that calls several long running python and java processes from a shell command activity. Each time the shell command activity runs, a reportProgress error is thrown in the Task…
Matthew
  • 21
  • 2
2
votes
2 answers

Export data from a MariaDB RDS table to S3 - Data Pipeline failing

My goal is to export a large (~300GB) table to a csv/tsv in S3 for long term storage (basically, if someone WANTS to look at it in years to come, they can, but it is not required to be available online). I need to copy JUST THIS ONE TABLE, not the…
zaitsman
  • 8,984
  • 6
  • 47
  • 79
2
votes
0 answers

Import from S3 to MySQL RDS and create tables for aws?

I am new to aws rds. Now I am trying to import several csv files to MySQL RDS with Data Pipeline service. I chose the so-called Load S3 data into RDS MySQL table template. But the parameters is where I have a headache. My data is in my…