Questions tagged [amazon-data-pipeline]

Simple service to transfer data between Amazon data storage services, kick off Elastic MapReduce jobs, and connect with outside data services.

From the AWS Data Pipeline homepage:

AWS Data Pipeline is a web service that helps you reliably process and move data between different AWS compute and storage services as well as on-premise data sources at specified intervals. With AWS Data Pipeline, you can regularly access your data where it’s stored, transform and process it at scale, and efficiently transfer the results to AWS services such as Amazon S3, Amazon RDS, Amazon DynamoDB, and Amazon Elastic MapReduce (EMR).

AWS Data Pipeline helps you easily create complex data processing workloads that are fault tolerant, repeatable, and highly available. You don’t have to worry about ensuring resource availability, managing inter-task dependencies, retrying transient failures or timeouts in individual tasks, or creating a failure notification system. AWS Data Pipeline also allows you to move and process data that was previously locked up in on-premise data silos.

470 questions
0
votes
1 answer

How to schedule data pipe line in different time zones(other than UTC)?

We have to configure a service where it will consider daylight saving. I am trying to use expression(inTimeZone) in data pipeline schedule object to call the service according to CST timings but could not find solution. Could anyone please suggest…
0
votes
1 answer

Configure Public Key Auth with Amazon Data Pipeline, SFTP, and S3

From the following Data Pipeline ShellCommandWith (S)FTP Sample: The sample relies on having public key authentication configured to access the SFTP server. How do I configure public key authentication so that my Amazon Data Pipeline's…
0
votes
1 answer

How to access individul elements of a blob in dynamoDb using a hive script?

I am transferring data from DynamoDB to S3 using a hive script in AWS Data Pipeline. I am using a script like this : CREATE EXTERNAL TABLE dynamodb_table ( PROPERTIES STRING, EMAIL STRING, ............. ) STORED BY …
rightCoder
  • 281
  • 1
  • 3
  • 18
0
votes
2 answers

How to specify EC2 computation resource for AWS datapipeline

I am trying to create different kind of pipelines but I am not able to specify an EC2 computation resource properly .I have my EC2 instances and I'm providing their subnet ID,it still however doesn't work. Do I have to create a different kind of…
0
votes
1 answer

How to see ec2 instance details that are created by data pipeline

I have created an ec2 instance using data pipeline. TerminateAfter field value was set to 2 hours. How can i verify the created ec2 instance details(like ip, java versions...) using aws ec2 console?
0
votes
2 answers

Is There A Way To Only Copy Specific Columns From RedShift To S3 Using RedShiftCopyActivity?

I assume that copying from RedShift -> S3 can only be done with RedshiftcCopyActivity. However I can't seem to find a way to copy only specific columns to S3 (only copy all columns). The reason I am doing this because one of the columns in the…
Aditya Wirayudha
  • 1,024
  • 12
  • 19
0
votes
1 answer

Amazon Data Pipeline: List of acceptable types for Custom Format

I am using DataPipeline to pipe a CSV from S3 into RDS As part of this process, I'm using a DataFormat which is a CSV According to the documentation , I can have STRING, DATETIME and INT Are there other types that I can use? (namely date, floating…
Stephane Maarek
  • 5,202
  • 9
  • 46
  • 87
0
votes
2 answers

Best way to automate a process to be run from command line (via AWS)

I am working on a web application to provide a software as a web-based service using AWS, but I'm stuck on the implementation. I will be using a Content Management System (probably Joomla) to manage user logins and front-end tasks such as receiving…
0
votes
2 answers

AWS Data Pipeline Service creates new ec2 instance

I have created a new DataPipeline to stop some instances e.g tagged as auto-stop/auto-start . My command is sth like this: aws ec2 describe-instances --region us-west-2 --filter "Name=tag:auto-stop,Values=yes"…
sakhunzai
  • 13,900
  • 23
  • 98
  • 159
0
votes
1 answer

Create New Pipeline on AWS

I created and configured Data Pipeline to run AWS CLI commands that stop and start Amazon EC2 instances at scheduled intervals, but after I activated it, then data pipeline status is going failed like this picture below: enter image description…
0
votes
0 answers

DynamoDB cross-regional table copy only copies partial data

I've tried default setup of Data Pipeline for cross-regional table copy. Copying one table to another in same region (eu-west-1). On pipeline activation, EMR cluster is launched, runs for approx 20 minutes and then it's terminated with pipeline…
Marius Grigaitis
  • 2,520
  • 3
  • 23
  • 30
0
votes
1 answer

Is it possible to run Data Pipeline ShellCommandActivity on existing EC2 instance that is stopped?

My final goal is to perform data transformation using existing machine with preinstalled software - more exactly the software is an R script that uses non standard packages [possibly installed manually] - so I would rather like to start existing…
0
votes
1 answer

Use the name of the table from Amazon RDS in the output csv being sent to S3

I successfully managed to get a data pipeline to transfer data from a set of tables in Amazon RDS (Aurora) to a set of .csv files in S3 with a "copyActivity" connecting the two DataNodes. However, I'd like the .csv file to have the name of the table…
D. Woods
  • 3,004
  • 3
  • 29
  • 37
0
votes
2 answers

automating 3rd party api pull an push into AWS RDS SQL using Python

I wrote a Python script that will pull data from a 3rd party API and push it into a SQL table I set up in AWS RDS. I want to automate this script so that it runs every night (e.g., the script will only take about a minute to run). I need to find a…
ansonw
  • 1,559
  • 1
  • 16
  • 22
0
votes
1 answer

Exporting dynamodb table as csv data pipeline

I am trying to export my DynamoDB data as a .CSV file to S3. I've used; { "id" : "DynamoDBDataType", "type" : "CSV", "column" : [ "Name GsaDynamoDBDataType", "Score INT", "DateOfBirth TIMESTAMP" ] } and associated it with the s3…
Ste-3PO
  • 1,165
  • 1
  • 9
  • 18