Questions tagged [aws-data-pipeline]

Use amazon-data-pipeline tag instead

Simple service to transfer data between Amazon data storage services, kick off Elastic MapReduce jobs, and connect with outside data services.

80 questions
0
votes
0 answers

AWS Data pipeline: No policy attached to the role

I am trying to create a Data pipeline to copy the data from postgres to an s3 bucket. I created an IAM role for the ec2 { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ …
0
votes
0 answers

Migrate an AWS Data Pipeline incremental copy of RDS table to Redshift

I have a AWS Data Pipeline that runs a query on MySQL RDS DB and loads the result in Redshift, as described in AWS docs. Now Data Pipeline service is going away, so I need to migrate it out of this service. What would be an optimal implementation of…
0
votes
0 answers

How to migrate an AWS Data Pipeline job that calls pg_dump, to use maybe AWS Glue or Lambda instead?

The Data Pipeline job runs on a schedule that calls a shell script which ultimately calls pg_dump. I'd like to continue generating the pg_dump backups since they are useful, but move away from using Data Pipeline since I notice that AWS are soon to…
0
votes
0 answers

Finding error in AWS Data Pipeline step using CLI

How do I find the errorStackTrace or the hostname of a data pipeline object? I tried using describe_objects API but it doesn't return any information on error or hostname.
Ayushi
  • 1
0
votes
0 answers

AWS DataPipeline ShellCommandActivity alternative

What can be an equivalent service to AWS DataPipeline ShellCommandActivity? I want the activity to trigger a command on the local environment, can we achieve this using Step Functions? If not, what can be the other AWS solutions?
0
votes
1 answer

AWS Glue metrics to populate Job name, job Status, Start time, End time and Elapsed time

I tried various metrics options using glue.driver.* but there is no clear way to get Job name, job Status, Start time, End time and Elapsed time in Cloudwatch metrics. This info is already available under Job Runs history but no way to get this on…
Sam
  • 392
  • 1
  • 6
  • 18
0
votes
1 answer

AWS CDK DataPipeline how to import an existing data pipeline

I have a datapipeline running already, previously manually created, now I'd like to use CDK code to manage it. How can I do so? (use aws cdk typescript library find/import this datapipeline and manage it) Like for example, in AWS SNS, we could use…
0
votes
1 answer

AWS Data Pipeline keeps running into FileAlreadyExistsException

I basically followed this tutorial to set up a simple DataPipeline to export my DynamoDB table to S3. But whenever I tried to run it, it keeps throwing Details :Exception in thread "main" org.apache.hadoop.mapred.FileAlreadyExistsException: Output…
0
votes
1 answer

How can I attaching an additional EBS volume to EC2 from Data Pipeline?

I need to attached an additional EBS volume to my ec2 instance from data pipeline. I think currently data pipeline service does not support an option to specify ebs volumes to attach to a ec2 resource. are there any way to do that using data…
0
votes
1 answer

AWS DataPipeline via Cloudformation throws error 'type is not defined in fields'

I'm trying to deploy the Export DynamoDB Table to S3 template via Cloudformation but getting a type is not defined in fields error from Cloudformation. I have a Key with the value of type for all of my PipelineObjects with the exception of the…
0
votes
0 answers

UnsupportedClassVersionError with mysql jdbc driver in AWS Data Pipeline

I am trying to run a Data Pipeline job in AWS. I added the field "Jdbc Driver Jar Uri" and placed the jar file in my s3 bucket, per instructions here, because it seems "Connector/J" that is installed by AWS Data Pipeline does not work. I'm using…
0
votes
1 answer

How to update data when loading it between two S3 buckets using AWS Glue?

This is my first data analytics project and I'm working on a data pipeline on AWS, the pipeline steps should be as follow: Export data from RDS to S3 in parquet format (Done). Query data in S3 using Athena (Done). Update the invalid data and…
0
votes
1 answer

Data Pipeline Solution

We have a use-case to build data pipeline solution in which we need following things: Ability to have multiple steps (outputs from one step should feed as input to next) Ability to have multiple algorithms (SQL Query or probably invoke REST…
0
votes
1 answer

AWS dynamodb export using data pipeline not working for eu-north-1?

I've prepared data pipeline (in eu-west-1 region) based on template that would export dynamodb table to S3. My table is located in eu-north-1 region, but when putting that under the parameter myDDBRegion I get the following…
0
votes
1 answer

Import csv file in s3 bucket with semi colon separated fields

I am using AWS Data Pipelines to copy SQL data to a CSV file in AWS S3. Some of the data has a comma between string quotes, e.g.: {"id":123455,"user": "some,user" .... } While importing this CSV data into DynamoDB, it takes the comma as the end of…