Questions tagged [aws-data-pipeline]

Use amazon-data-pipeline tag instead

Simple service to transfer data between Amazon data storage services, kick off Elastic MapReduce jobs, and connect with outside data services.

80 questions
0
votes
1 answer

Multiple S3 Inputs into Glue Pipeline

I have 3 separate data sources (files) in 3 separate S3 buckets. The schema in these 3 sources are different from one another but the timestamp is the same (hourly in epoch). Previously, I used Glue to read from 1 bucket and apply transformations to…
summerNight
  • 1,446
  • 3
  • 25
  • 52
0
votes
1 answer

Reading Partitioned Data through Athena in downstream jobs in pandas

I have 2 stages in my data pipeline, First stage reads data from source and dumps to intermediate bucket and next stage reads data from this intermediate bucket. I have athena setup on intermediate stage and we are planning to read this partition…
0
votes
0 answers

Completely deleting all resources related to AWS Glue and AWS Data Pipeline

I'm a student getting started with AWS (free tier). After realizing (I got billed) that I've exhausted my free tier for AWS Glue and Data Pipeline. I deleted all the resources that were billing me, even these two s3-buckets (mentioned in an image…
0
votes
1 answer

Batch file processing in AWS using Data Pipeline

I have a requirement of reading a csv batch file that was uploaded to s3 bucket, encrypt data in some columns and persist this data in a Dynamo DB table. While persisting each row in the DynamoDB table, depending on the data in each row, I need to…
0
votes
1 answer

AWS IAM Setup for EC2 Resource in AWS Data Pipeline

I am having an issue getting AWS Data Pipeline to run on an EC2 Instance via a Shell Command Activity. I have been following the guide found here step by step:…
WolVes
  • 1,286
  • 2
  • 19
  • 39
0
votes
1 answer

Using mulitple sql query through psql connecting from a remote client

I am connecting to postgressql DB through AWS Datapipeline using a shellscript activity. I have to delete the data of 60 tables and copy the data into the tables from files. When a copy job fails I want to rollback the table to the previous state so…
Subrahmanyam
  • 27
  • 1
  • 5
0
votes
1 answer

How can create an Amazon Event Bus rule to handle an AWS Data Pipeline event?

We have an AWS Data Pipeline that copies data from S3 into Redshift (RedshiftCopyActivity). We are looking to call a Lambda function when the copy is complete. My understanding so far is: Amazon Event Bus is the recommended way to handle the…
blu
  • 12,905
  • 20
  • 70
  • 106
0
votes
1 answer

How to read ssm parameters in a shell script in aws data pipeline?

I'm setting up a data pipeline in aws. and plan to use a "getting started using ShellCommandActivity" template to run a shell script. how can i pass credentials stored in ssm parameter as a parameter to this script.
arve
  • 569
  • 2
  • 10
  • 27
0
votes
1 answer

Set up cross account access for AWS S3

I have two AWS accounts, say A (for use of AWS services) and B(for s3). I want to access B's s3 bucket in account A's Data pipeline service to manage data transfer within Account B. I have access key pair for Account B. How can I set up S3 access to…
shiva
  • 11
  • 2
0
votes
2 answers

AWS data pipeline unable to create through serverless yaml template

I was creating data pipeline for dynamo db export to s3. The template given for serverless yaml is not working on "PAY_PER_REQUEST" billing mode Created one using aws console itr worked fine, exported its definition, tried to create using same…
0
votes
0 answers

How to sync data form redshift to dynamodb

Is there a cloud-formation template for creating data pipeline to sync data from redshift to Dynamo-DB? Thanks, Vinod.
0
votes
1 answer

Move data from S3 to Amazon Aurora Postgres

I have multiple files present in different buckets in S3. I need to move these files to Amazon Aurora PostgreSQL every day on a schedule. Every day I will get a new file and, based on the data, insert or update will happen. I was using Glue for…
0
votes
1 answer

AWS Data Pipeline Creation Error Code: Throttling-- Rate Exceeded

Error Image 1 -> https://i.stack.imgur.com/16YSg.png Error Image 2 ->https://i.stack.imgur.com/4bZkU.png
Atul
  • 155
  • 2
  • 10
0
votes
1 answer

AWS Data Pipeline S3 CSV to DynamoDB JSON Error

I'm trying to insert several csv located in the S3 directory with the AWS DATA Pipeline But, I'm taking this error. at javax.security.auth.Subject.doAs(Subject.java:422) at…
0
votes
1 answer

Best practices for setting up a data pipeline on AWS? (Lambda/EMR/Redshift/Athena)

*Disclaimer: *This is my first time ever posting on stackoverflow, so excuse me if this is not the place for such a high-level question. I just started working as a data scientist and I've been asked to set up an AWS environment for 'external' data.…