Questions tagged [amazon-data-pipeline]

Simple service to transfer data between Amazon data storage services, kick off Elastic MapReduce jobs, and connect with outside data services.

From the AWS Data Pipeline homepage:

AWS Data Pipeline is a web service that helps you reliably process and move data between different AWS compute and storage services as well as on-premise data sources at specified intervals. With AWS Data Pipeline, you can regularly access your data where it’s stored, transform and process it at scale, and efficiently transfer the results to AWS services such as Amazon S3, Amazon RDS, Amazon DynamoDB, and Amazon Elastic MapReduce (EMR).

AWS Data Pipeline helps you easily create complex data processing workloads that are fault tolerant, repeatable, and highly available. You don’t have to worry about ensuring resource availability, managing inter-task dependencies, retrying transient failures or timeouts in individual tasks, or creating a failure notification system. AWS Data Pipeline also allows you to move and process data that was previously locked up in on-premise data silos.

470 questions
0
votes
0 answers

Unable to create AWS data pipeline for copying s3 to redshift

I am new to AWS, im trying to create a data pipeline to transfer s3 files into redshift. I have already performed the same task manually. Now with pipelining, I am unable to proceed further here Problem with Copy Options : Sample data on s3 files…
0
votes
1 answer

How to run multiple Hive activities in parallel using AWS data pipeline?

We want to use AWS data pipeline to automate data ingestion process. In our ingestion process we mainly copy CSV files into S3 bucket and run Hive queries on it for more than 100 different tables. We want to create one pipeline in which we will be…
Shekhar
  • 11,438
  • 36
  • 130
  • 186
0
votes
1 answer

How to specify column mapping in AWS Data pipeline?

I am using AWS data pipeline to copy data from RedShift to MySql in RDS. The data is copied to MySQL. In the pipeline the insert query is specified as below: insert into test_Employee(firstname,lastname,email,salary) values(?,?,?,?); Is there any…
Anshuman Jasrotia
  • 3,135
  • 8
  • 48
  • 81
0
votes
1 answer

Datapipeline task stuck in WAITING_FOR_RUNNER state

I have created a simple ShellCommandActivity which echos some text. It runs on a plain ec2 (vpc) instance. I see that the host has spinned up but it never executes the tasks and the task remains in WAITING_FOR_RUNNER status. After all the retries I…
ishan3243
  • 1,870
  • 4
  • 30
  • 49
0
votes
2 answers

Extracting multiple RDS MySQL tables to S3

Rather new to AWS Data Pipeline so any help will be appreciated. I have used the pipeline template RDStoS3CopyActivity to extract all contents of a table in RDS MySQL. Seems to be working good. But there are 90 other tables to be extracted and…
dat789
  • 1,923
  • 3
  • 20
  • 26
0
votes
1 answer

Is there a way to group my DynamoDB export tasks on one EMR cluster?

When I set up a re-occuring backup via the export function in the DynamoDB console, the task it creates automatically creates a new EMR cluster when it runs. Some of my tables need to be backed up but are fairly small. What I end up with is a huge…
David
  • 1,648
  • 1
  • 16
  • 31
0
votes
1 answer

How to change AWS environment variables with ShellCommandActivity

I want to dynamically increment my environment variables (dates) for my AWS datapipeline and was wondering if someone has achieved this through ShellCommandActivity by changing the config.json file? { "values": ..{} }
Sam.E
  • 175
  • 2
  • 10
0
votes
2 answers

AWS Data Pipeline DynamoDB to S3 to Redshift including JsonPaths

I'm aware of the standard COPY from DynamoDB to Redshift, but that only works for schemas without Maps and Lists. I have several ddb tables with maps and lists and I need to use jsonpaths to do the import to Redshift. So my question is, can I…
0
votes
1 answer

how to extract only some values from a table in dynamodb from a data pipeline?

I have a table in dynamoDB. I want to backup only some particular records to S3. Is there anyway to do that using DataPipeline?
cruck
  • 5
  • 2
0
votes
2 answers

How to provide Redshift Database Password in Python Script in AWS Datapipeline?

I am using Redshift and have to write some custom scripts to generate reports. I am using AWS datapipeline CustomShellActivity for running my custom logic. I am using python and boto3. I am wondering what is the safest way and in fact, best practice…
0
votes
1 answer

copy specific data in one s3 bucket to another bucket using data pipeline

I have uploaded a JSON file in aws s3 bucket, as JSON file is in form of key and value, i want to copy only specific keys data into other s3 bucket using aws data pipeline. To do this operation what should i add in pipeline definition. Anyone with…
0
votes
1 answer

Setup AWS Data Pipeline on long running EMR cluster

If I want to have long running EMR cluster and after that I want to setup Data Pipeline doing something on that cluster, how I can do it? I must install Task Runner on this EMR cluster? Or maybe Task Runner will be preinstalled ? Or maybe there is…
lubom
  • 329
  • 2
  • 13
0
votes
1 answer

AWS data pipeline -"Full copy of rds mysql table to s3" no connection string in parameters?

I'm currently working on my first pipeline copying data from RDS to S3. I'm following the guidelines provided by Amazon (see below). There isn't a RDS MySQL connection string field for my case. Anyone why this might…
0
votes
1 answer

Unable to add AWS DataPipeline activity using awscli

I have lots of DynamoDB tables to setup backups in Data Pipeline. I am able to pass a json file via aws command line for 1 or 2 tables which means the JSON file is working. However, when I am passing a large JSON (with 50-100 DynamoDB tables) to…
Varun Chandak
  • 943
  • 1
  • 8
  • 25
0
votes
1 answer

AWS DynamoDB - Data Pipeline real write capacity consumption

I've created a data pipeline that pull data from S3 and push it into DynamoDB. The pipeline started to run successfully. I've set the write capacity to 20000 units, after few hours the writing decreased in a half, now it's still running with a write…