Questions tagged [amazon-data-pipeline]

Simple service to transfer data between Amazon data storage services, kick off Elastic MapReduce jobs, and connect with outside data services.

From the AWS Data Pipeline homepage:

AWS Data Pipeline is a web service that helps you reliably process and move data between different AWS compute and storage services as well as on-premise data sources at specified intervals. With AWS Data Pipeline, you can regularly access your data where it’s stored, transform and process it at scale, and efficiently transfer the results to AWS services such as Amazon S3, Amazon RDS, Amazon DynamoDB, and Amazon Elastic MapReduce (EMR).

AWS Data Pipeline helps you easily create complex data processing workloads that are fault tolerant, repeatable, and highly available. You don’t have to worry about ensuring resource availability, managing inter-task dependencies, retrying transient failures or timeouts in individual tasks, or creating a failure notification system. AWS Data Pipeline also allows you to move and process data that was previously locked up in on-premise data silos.

470 questions
0
votes
1 answer

create a datapipeline with tags usinf boto.datapipeline

I want to create AWS datapipeline with tags. We are using boto.datapipeline API for creating the datapipeline. these tags are used to give read/write access to datapipeline users using IAM management Please provide the code syntax to create a…
santhoshc
  • 29
  • 5
0
votes
1 answer

sqlactivity for redshift copy in amazon datapipeline does not pick wild card characters for filenames

I am using the sqlActivity in amazon data pipeline to copy data to my redshift table. The script runs fine if I specify one fileName like part-00000.gz but when i specify the wildcard .gz to pick all files in the directory I get the error where the…
user2330278
  • 67
  • 10
0
votes
1 answer

AWS data pipeline - pull data from external source?

I'm trying to use an AWS data pipeline to pull data from an externally hosted MySQL datasource into an RDS MySQL instance. Is this even possible? How can it be configured? I can't find anything about this in the documentation. If it's not possible,…
0
votes
2 answers

Incremental Load in Redshift

We are currently working on loading data into Redshift. We have different scenarios here. If the OLTP database is SQL Server residing on premise, then we can consider tool like Attunity that can help loading data to Redshift via S3. Attunity is…
0
votes
1 answer

Running AWS commands from commandline on a ShellCommandActivity

My original problem was that I want to increase my DynamoDB write throughput before I run the pipeline, and then decrease it when I'm done uploading (doing it max once a day, so I'm fine with the decreasing limitations). They only way I found to do…
Zach Moshe
  • 2,782
  • 4
  • 24
  • 40
0
votes
1 answer

Tagging EC2 machines in Pipeline's EMR Cluster (ImportCluster in the S3->DynamoDB example)

I'm trying to run the S3->DynamoDB example and having some problems running the EMR cluster that is created for the MyImportJob activity. We configured our IAM accounts such that every user can create EC2 machines with a specific 'team_id' tag (of…
Zach Moshe
  • 2,782
  • 4
  • 24
  • 40
0
votes
2 answers

All my AWS datapipelines have stopped working with Validation error

I use AWS data pipelines to automatically back up dynamodb tables to S3 on a weekly basis. All of my data-pipelines, have stopped working since two weeks ago. After some investigation, I see that EMR fails with "validation error" and "Terminated…
Ali
  • 18,665
  • 21
  • 103
  • 138
0
votes
1 answer

Failing pipelines for DynamoDB cross-region incremental copying

I'm trying to implement cross-region copying from us-east-1 to us-west-1. I used the cross-region copying template in Amazon Data Pipeline to copy a table every couple hours however I can't get incremental copying working. I have to fill in the…
0
votes
1 answer

Hello World PipeLine with ShelCommandlActivity

I'm trying to create a simple dataFlow pipeline with a single Activity of ShellCommandActivity type. I've attached the configuration of the activity and ec2 resource. When I execute this the Ec2Resource sits in the WAITING_ON_DEPENDENCIES state then…
webber
  • 1,834
  • 5
  • 24
  • 56
0
votes
1 answer

Amazon DataPipeline regex format

I have tried to parse a huge amount of logs from Amazon S3 bucket. So far, I created and configured a sample pipe, as described in the tutorial video. However for some reason my RegEx is screwed up after the pipe is activated. Originally, the regex…
jdevelop
  • 12,176
  • 10
  • 56
  • 112
0
votes
1 answer

AWSDatapipeline Backup/Restore and Validation

I am new in AWS datapipeline and i need to do backup of dynamoDb to S3 bucket and then restore from that backup back to some restored dyanmoDb table and then validate the records,means check number of records in S3 backup and restored dynamoDb…
Varun
  • 1,159
  • 1
  • 14
  • 19
0
votes
1 answer

Aws Datapipeline: List content of Output Bucket in ShellCommandActivity

How can I list the files which are contained in my output Bucket in a Shell Script? ls ${OUTPUT1_STAGING_DIR} does not work, as I get the message that there's no file or directory by this name. I am sure there is an easy way to do this but I can't…
Biffy
  • 871
  • 2
  • 10
  • 21
0
votes
1 answer

Backup from external Datasource to AWS S3 (using Data Pipelining)?

I am trying to move some Logfiles, which are located on an external Webserver to an Amazon S3 bucket. This should happen every 7 days without manually activating it. Additionally I'd like it to be "failsafe", so it probably would be best if the…
0
votes
1 answer

What to use for exporting DynamoDb

I would like to create a data pipeline which will export data from dynamoDB and import it to s3. everything seems fine but there is a problem because , my data on dynamoDB is binary and pipeline settings , there is not accepted data type as…
Ducaz035
  • 3,054
  • 2
  • 25
  • 45
0
votes
2 answers

Amazon SNS configuration for Data Pipeline success and failure

I am using the Amazon data pipeline for the automation of some shell activity. Which will run once in a day. So, I was configuring the amazon SNS for letting me know whether the last run of the shell activity was successful or fail. If, failed then…
Naresh
  • 5,073
  • 12
  • 67
  • 124
1 2 3
31
32