Questions tagged [amazon-data-pipeline]

Simple service to transfer data between Amazon data storage services, kick off Elastic MapReduce jobs, and connect with outside data services.

From the AWS Data Pipeline homepage:

AWS Data Pipeline is a web service that helps you reliably process and move data between different AWS compute and storage services as well as on-premise data sources at specified intervals. With AWS Data Pipeline, you can regularly access your data where it’s stored, transform and process it at scale, and efficiently transfer the results to AWS services such as Amazon S3, Amazon RDS, Amazon DynamoDB, and Amazon Elastic MapReduce (EMR).

AWS Data Pipeline helps you easily create complex data processing workloads that are fault tolerant, repeatable, and highly available. You don’t have to worry about ensuring resource availability, managing inter-task dependencies, retrying transient failures or timeouts in individual tasks, or creating a failure notification system. AWS Data Pipeline also allows you to move and process data that was previously locked up in on-premise data silos.

470 questions
0
votes
1 answer

[amazon-data-pipeline]: can we not clone existing aws data pipeline from java sdk?

I want to clone my existing pipeline through code, and I just had quick walk through aws data pipeline document here. I couldnt find method to clone the existing pipeline in the SDK. Why is that? Can somebody please answer?
tyro
  • 577
  • 8
  • 17
0
votes
1 answer

Configure and Deploy Lambda Pipeline in code

I was wondering if there are any AWS Services or projects which allow us to configure a data pipeline using AWS Lambdas in code. I am looking for something like below. Assume there is a library called pipeline from pipeline import connect, s3,…
RAbraham
  • 5,956
  • 8
  • 45
  • 80
0
votes
1 answer

AWS Datapipeline to Ruby Code

I am in the process of taking over a set of data pipelines on AWS. They are all built using the AWS graphical editor tool. The pipelines are getting complex and my goal is move them to code and have them versioned. We are a ruby shop so besides…
RidingRails
  • 782
  • 3
  • 7
  • 21
0
votes
1 answer

AWS Data Pipeline - Java SDK - How to put a pipeline definition from JSON file

I have an AWS data pipeline definition in JSON format Using Java SDK I have created an empty pipeline and now I would like to use my JSON to put the pipeline definition. Basically I would like to create a PutPipelineDefinitionRequest…
Erica
  • 1,608
  • 2
  • 21
  • 32
0
votes
1 answer

Create aggregate table in AWS Data Pipeline

I have granular data stored in Redshift. I want an aggregate table created regularly. I'm seeking to use AWS Data Pipeline to do this. Let's say for conversation that I have a table of all flights. I want to generate a table of airports and the…
ScottieB
  • 3,958
  • 6
  • 42
  • 60
0
votes
1 answer

aws pipeline parameter error

I have created a pipeline to load data from S3 to RDS mysql instance.I can save the pipeline without any errors but on activation I get the error "No value specified for parameter 1". My online search so far has suggested that the insert statement…
Akshata T
  • 37
  • 5
0
votes
2 answers

How to run ShellCommandActivity on my own EC2 instance?

I am trying to run a simple command to test a ShellCommandActivity with Data Pipeline from AWS. >>> /usr/bin/python /home/ubuntu/script.py That script should create a file on a S3, I know I could create a S3 file using the same Data Pipeline, but I…
Gocht
  • 9,924
  • 3
  • 42
  • 81
0
votes
1 answer

Are there any open source scheduling tools for AWS Redshift?

I have few sql statements (insert, update, delete, truncate) to be executed in a transaction, every 5 mins. I cannot use AWS Data Pipeline since minimum scheduling interval is 15 mins for pipeline. Are there any open source tools I can use? Can I…
0
votes
1 answer

Map attributes in DynamoDB table while migrating data

I have two DynamoDB tables with the following items: Table_1 SomeId: string Name: string Table_2 Id: string Name: string Surname: string This is what I need: Migrate the data from Table_1 to Table_2. Map the Table_1.SomeId attribute to the…
0
votes
3 answers

Archive Dynamodb based on date/days

I want to archive dynamodb table, keeping data only for 90 days. I have a field called recorded_on in the table which I can use to track 90days. Looked at Datapipeline and it seems little overkill with EMR since we don't need it. Any better ways to…
user3089927
  • 3,575
  • 8
  • 25
  • 33
0
votes
1 answer

Datapipeline and Mule

Which version of MULE will be compatible for AWSSDK for Datapipeline, We need to activate Datapipeline through Enterprises version of Mule. Since its Cloud, we can use any mule version from 3.6. Please help us. Right now the Jar version are causing…
0
votes
1 answer

How to reference the Amazon Data Pipeline name?

Is it possible to use the name of an Amazon Data Pipeline as a variable inside the Data Pipeline itself? If yes, how can you do that?
Mihai Tache
  • 161
  • 3
  • 9
0
votes
1 answer

How do I call a stored procedure in SQL Server with Data Pipeline in ShellCommandActivity (AWS Data Pipeline)

I know you can call a MySQL procedure with the script below, but is the same possible for SQL Server? mysql --host host_url --port port_number --user username --password password --execute="CALL stored_proc_name; I have SQL Server Express, and need…
0
votes
1 answer

AWS Data Pipeline MySQL Nulls sed shell command activity MIGRAINE

There is the following scenario: SQL table needs to be transferred to a MySQL database daily. I tried with Data Pipeline using CopyActivity but the exported CSV has empty spaces instead of \N or NULLs so MySQL import those fields as "" which is not…
0
votes
1 answer

Passing variables between EC2 instances in multi-step AWS data pipeline

I have a pipeline setup wherein I have 3 main stages: 1) take input from a zipped file, unzip this file in s3. run some basic verification on each file to guarantee its integrity, move to step 2 2) kick off 2 simultaneous processing tasks on…