Questions tagged [amazon-data-pipeline]

Simple service to transfer data between Amazon data storage services, kick off Elastic MapReduce jobs, and connect with outside data services.

From the AWS Data Pipeline homepage:

AWS Data Pipeline is a web service that helps you reliably process and move data between different AWS compute and storage services as well as on-premise data sources at specified intervals. With AWS Data Pipeline, you can regularly access your data where it’s stored, transform and process it at scale, and efficiently transfer the results to AWS services such as Amazon S3, Amazon RDS, Amazon DynamoDB, and Amazon Elastic MapReduce (EMR).

AWS Data Pipeline helps you easily create complex data processing workloads that are fault tolerant, repeatable, and highly available. You don’t have to worry about ensuring resource availability, managing inter-task dependencies, retrying transient failures or timeouts in individual tasks, or creating a failure notification system. AWS Data Pipeline also allows you to move and process data that was previously locked up in on-premise data silos.

470 questions

votes

2 answers

Copying DynamoDB table to another DynamoDB table with transforms

I have two DynamoDB tables: Table_1 and Table_2. I am trying to deprecate Table_1 and copy information into Table_2 from Table_1, which has different GSIs and different LSIs. Table_1 attributes are: Id, state, isReused, empty, normal Table_2…

amazon-web-services amazon-dynamodb amazon-data-pipeline

asked May 26 '16 at 23:42

nat

votes

1 answer

Amazon Data Pipeline "Load S3 Data to RDS MySQL" query format?

I was wondering what the SQL Query format would be for inserting data from a CSV into MySQL would be. The template it gives is, "INSERT INTO tablename (col1, col2, col3) VALUES (?,?,?);" Because the values are dynamic and different in each CSV file,…

mysql csv amazon-web-services amazon-s3 amazon-data-pipeline

asked Apr 11 '16 at 21:10

kakkman

votes

1 answer

Amazon EMR job with multiple input parameters

In Amazon data pipeline, I am creating activity to copy S3 to EMR using Hive. To achieve it I have to pass two input parameters into EMR job as a step. I have searched all most every data pipeline documentation but did not found the way to specify…

hadoop amazon-s3 amazon-emr amazon-data-pipeline

asked Jan 29 '15 at 06:28

Irfan.gwb

votes

2 answers

aws data pipeline datetime variable

I am using AWS Data Pipeline to save a text file to my S3 bucket from RDS. I would like the file name to to have the date and the hour in the file name like: myfile-YYYYMMDD-HH.txt myfile-20140813-12.txt I have specified my S3DataNode FilePath…

variables datetime amazon-web-services amazon-data-pipeline

asked Aug 13 '14 at 17:00

davedi

votes

1 answer

How to use scriptVariables in hive (AWS Data Pipeline)

We can pass script variables into AWS data pipeline hiveactivity using the following construct : "scriptVariable" : [ "param1=value1", "param2=value2" ] How do we access these variables in the hive script? I have been trying to use them…

amazon-web-services hive amazon-data-pipeline

asked May 01 '14 at 19:47

Santanu C

1,362
3
20
38

votes

2 answers

AWS Copy S3 to RDS

I am trying to copy from S3(.csv file) to RDS(MySQL) using Amazon Data pipeline and My error: Error copying record Cause: com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure The last packet sent successfully to…

amazon-web-services amazon-rds amazon-data-pipeline

asked Dec 23 '13 at 13:55

user3128161

votes

2 answers

AWS Data Pipeline didn't showing EC2 instance role

I am trying to get data from S3 to Dynamodb using AWS Data Pipeline. The issue I am facing is that my "Data Pipeline" wasn't showing EC2 instance role even though I have created one in the IAM. I have created default roles for Pipeline and…

amazon-web-services amazon-s3 amazon-dynamodb amazon-data-pipeline

asked Feb 16 '22 at 15:41

Rehan CH

votes

0 answers

HIVE_CURSOR_ERROR: Unexpected end of input stream

I'm moving the data from Mysql to S3 using data pipeline and it creates empty file for couple of days. I believe, it is making my athena query fails with "HIVE_CURSOR_ERROR: Unexpected end of input stream". Below is my script CREATE EXTERNAL…

amazon-s3 hive amazon-data-pipeline amazon-athena

asked Apr 18 '18 at 00:35

Ramesh Gunasekaran

votes

2 answers

Bulk add ttl column to dynamodb table

I have a use case where I need to add ttl column to the existing table. Currently, this table has more than 2 billion records. Is there any existing solution build around same? Or Should be emr is the path forward?

amazon-dynamodb emr amazon-emr amazon-data-pipeline data-pipeline

asked Feb 19 '18 at 22:15

Vivek Goel

22,942
29
114
186

votes

4 answers

What's the best way to run a python script daily?

I have a python script that connects to Redshift, executes a series of SQL commands, and generates a new derived table. But for the life of me, I can't figure out a way to have it automatically run every day. I've tried AWS Data Pipeline but my…

python amazon-web-services amazon-data-pipeline

asked Nov 16 '17 at 06:10

ScottieB

3,958
6
42
60

votes

0 answers

Output an AWS Data Pipeline TableBackupActivity to multiple S3 locations?

I have set up an AWS Data Pipeline of DynamoDB data into an S3DataNode, by using the DynamoDB->Export menu option that sets up the basic pipeline template. I run that once a day, and it outputs into an S3 folder like "TableName/DATE/". I set that…

amazon-web-services amazon-s3 amazon-dynamodb amazon-data-pipeline

asked Jun 01 '17 at 14:55

Pamela Fox

votes

1 answer

Creating column headers in CSV/TSV files using AWS Data Pipeline?

I'm creating CSV & TSV files using AWS Data Pipeline. The files are creating just fine, but I can't figure out how to create files with column headers. At first, I expected the headers to generate automatically based on the SQL query I'm running to…

csv amazon-web-services export-to-csv amazon-data-pipeline

asked Jan 21 '17 at 00:29

T. Brian Jones

13,002
25
78
117

votes

1 answer

AWS Datapipeline ServiceAccessSecurityGroup

When I try to create an EMRcluster resource with those properties: Emr Managed Master Security Group Id Emr Managed Slave Security Group Id I have this error : Terminated with errors. You must also specify a ServiceAccessSecurityGroup if you use…

emr amazon-data-pipeline

asked Oct 25 '16 at 18:37

bbenjii123

votes

1 answer

AWS Data Pipeline - How to set global pipeline variable from ShellCommandActivity

I am trying to augment my pipeline (migrates data from RDS to RedShift) so that it selects all rows whose id is greater than the maximum id that exists in RedShift. I have a script in Python that calculates this value and returns it to the output. I…

bash amazon-web-services amazon-data-pipeline

asked Sep 01 '16 at 23:16

user2694306

3,832
10
47
95

votes

1 answer

Can I have a data-pipeline as a part of my cloud-formation template?

My app has a S3 bucket with daily feeds, 2 DynamoDB tables that stores this data, an ELB application that exposes the JSON API to that data and a data pipeline flow that processes the incoming data and uploads into the tables. My CloudFormation…

aws-cloudformation amazon-data-pipeline

asked Nov 24 '14 at 10:33

Zach Moshe

2,782
4
24
40

Prev 1 2

…

31 32 Next