Questions tagged [amazon-data-pipeline]

Simple service to transfer data between Amazon data storage services, kick off Elastic MapReduce jobs, and connect with outside data services.

From the AWS Data Pipeline homepage:

AWS Data Pipeline is a web service that helps you reliably process and move data between different AWS compute and storage services as well as on-premise data sources at specified intervals. With AWS Data Pipeline, you can regularly access your data where it’s stored, transform and process it at scale, and efficiently transfer the results to AWS services such as Amazon S3, Amazon RDS, Amazon DynamoDB, and Amazon Elastic MapReduce (EMR).

AWS Data Pipeline helps you easily create complex data processing workloads that are fault tolerant, repeatable, and highly available. You don’t have to worry about ensuring resource availability, managing inter-task dependencies, retrying transient failures or timeouts in individual tasks, or creating a failure notification system. AWS Data Pipeline also allows you to move and process data that was previously locked up in on-premise data silos.

470 questions

votes

1 answer

Data Pipeline dump for DynamoDb to S3 failed all the time

I used instruction to setup dumps for DynamoDb: http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-importexport-ddb-part2.html Data Pipeline setup was fine. But after execution task I have same error all the time. I researched this…

amazon-dynamodb dump amazon-data-pipeline

asked Feb 16 '17 at 10:40

Vladimir Gilevich

votes

2 answers

Using Amazon's Date Pipeline to backup S3 bucket -- how to skip existing files and avoid unnecessary overwriting?

I'm using Amazon's Date Pipeline to copy and S3 bucket to another bucket. It's a pretty straightforward setup, and runs nightly. However, every subsequent run copies the same files over and over--I'd rather it just skip existing files and copy only…

amazon-web-services amazon-s3 amazon-data-pipeline

asked Jan 18 '17 at 13:51

trevorhinesley

votes

3 answers

AWS Data Pipeline - Error when trying to re-run a failed acitivity

My datapipeline has many acitivities (Shellcommandactivity) one of which has failed due to a programmatic issue. However when i try to re-run the failed activity after fixing programmatic issue. Failure & rerun mode is - cascade & schedule Type is…

amazon-web-services amazon-data-pipeline

asked Aug 24 '16 at 16:39

Anil Kamath

votes

2 answers

How to restart an AWS Data Pipeline

I have a scheduled AWS Data Pipeline that failed partway through its execution. I fixed the problem without modifying the Pipeline in any way (changed a script in S3). However, there seems to be no good way to restart the Pipeline from the…

amazon-web-services amazon-data-pipeline

asked Jul 23 '16 at 21:00

Simon Lepkin

1,021
1
13
25

votes

1 answer

Flattening JSON file while transferring from S3 to RedShift using AWS Pipeline

I have json file on S3, I want to transfer it to Redshift. One catch is that the file contains entries in such a format: { "user_id":1, "metadata": { "connection_type":"WIFI", "device_id":"1234" …

amazon-web-services amazon-s3 amazon-redshift amazon-data-pipeline

asked Jun 20 '16 at 21:44

Kamil Grabowski

votes

1 answer

DataPipeline: Use only first 4 values from CSV in pipeline

I have a CSV, which has a variable structure, which I only want to take the first 4 values from. The CSV stored in S3 has between 7 and 8 fields in it, and I would like to take just the first 4. I have attempted to use the following prepared…

csv amazon-web-services amazon-data-pipeline

asked May 10 '16 at 12:39

dojogeorge

1,674
3
25
35

votes

1 answer

Importing data from S3 to DynamoDB

I am trying to import a JSON file which has been uploaded into S3 into DynamoDB I followed the tutorial amazon has given http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-importexport-ddb-console-start.html But when i try to…

amazon-s3 amazon-dynamodb amazon-data-pipeline

asked Apr 06 '16 at 18:57

Paisleymike

votes

3 answers

Insert blanks as NULL to MySQL

I'm building an AWS pipeline to insert CSV files from S3 to an RDS MySQL DB. The problem I'm facing is that when it attempts to load the file, it treats blanks as empty strings instead of NULLs. For example, Line 1 of the CSV…

mysql amazon-web-services amazon-data-pipeline

asked Apr 06 '16 at 14:47

rodrigocf

1,951
13
39
62

votes

3 answers

Need strategy advice for migrating large tables from RDS to DynamoDB

We have a couple of mySql tables in RDS that are huge (over 700 GB), that we'd like to migrate to a DynamoDB table. Can you suggest a strategy, or a direction to do this in a clean, parallelized way? Perhaps using EMR or the AWS Data Pipeline.

amazon-web-services amazon-dynamodb amazon-rds emr amazon-data-pipeline

asked Mar 25 '16 at 02:03

Ankit Kapur

votes

2 answers

AWS Data Pipeline RedShift "delimiter not found" error

I'm working on the data pipeline. In one of the steps CSV from S3 is consumed by RedShift DataNode. My RedShift table has 78 columns. Checked with: SELECT COUNT(*) FROM information_schema.columns WHERE table_name = 'my_table'; After failed…

csv amazon-web-services amazon-s3 amazon-redshift amazon-data-pipeline

asked Dec 10 '15 at 12:59

Arius

1,387
1
11
24

votes

1 answer

Data Pipeline S3 logs not written (only written if using Amazon Linux)

With the same exact Data Pipeline configuration, only differing in the AMI to be used (Amazon Linux vs. Ubuntu), my Data Pipeline execution will succeed in both cases but it will only write logs to S3 when using Amazon Linux. With Amazon Linux With…

amazon-data-pipeline

asked Nov 17 '15 at 22:11

deprecated

5,142
3
41
62

votes

2 answers

How can I specify EBS Volume when adding a EC2 Resource to AWS Data Pipeline?

When I try to create an EC2 Resource with a AWS Data Pipeline, I don't see and option for defining EBS volume that will be associated with that compute engine. Is it possible to set the volume size? If yes, can someone give me an example.

amazon-web-services amazon-ec2 amazon-ebs amazon-data-pipeline

asked Oct 01 '15 at 17:29

systemboot

votes

1 answer

How to set instance role for EMR clusters launched via data pipeline?

I'm trying to attach an instance role to a cluster I'm running through data-pipeline. I'd like to run my own mapper script that needs write permissions to DynamoDB (the "regular" HIVE upload won't do the trick for me). I've gone through the API docs…

elastic-map-reduce amazon-emr amazon-data-pipeline

asked Feb 05 '15 at 09:18

Zach Moshe

2,782
4
24
40

votes

4 answers

Do I need to set up backup data pipeline for AWS Dynamo DB on a daily basis?

I am considering using AWS DynamoDB for an application we are building. I understand that setting a backup job that exports data from DynamoDB to S3 involves a data pipeline with EMR. But my question is do I need to worry about having a backup job…

amazon-web-services amazon-dynamodb amazon-data-pipeline

asked Feb 07 '14 at 02:01

DaHoopster

votes

1 answer

Using a custom AMI (with s3cmd) in a Datapipeline

How can I install s3cmd on a AMI that is used in the pipeline? This should be a fairly basic thing to do but I can't seem to get it done: Here's what I've tried: Started a Pipeline without the Image-id option => Everything works fine Navigated to…

amazon-web-services amazon-ec2 amazon-data-pipeline

asked Nov 28 '13 at 16:45

Biffy

Prev 1 2 3

…

31 32 Next