Highest Voted 'aws-data-pipeline' Questions

1

vote

1 answer

# of records loaded through AWS Redshift

Is there a way through the AWS console to understand the number of records that got loaded into a redshift table using the AWS data pipeline?

amazon-web-services amazon-redshift aws-data-pipeline

asked Jun 15 '21 at 11:18

Keerthika Koka

33
6

1

vote

0 answers

Cralwer not creating table in data lake from postgres partition table

My Table is partitioned in postgres. I have created a Glue crawler to create table. I selected the option "Update all new and existing partitions with metadata from the table" in Configure the crawler's output. Since it's partitioned, the table is…

amazon-web-services aws-glue aws-data-pipeline aws-lake-formation

asked Apr 07 '21 at 18:48

rose1110

49
8

1

vote

0 answers

Using AWS Data Pipeline to move data from AWS RDS to S3

I was trying to move data from RDS to S3 as backup. I used DBeaver on my local pc to establish connection with AWS RDS and uploaded a csv file. I, then, tried to create a datapipeline to send data from RDS to S3. Initially, I got an error DBInstance…

amazon-s3 amazon-rds aws-data-pipeline

asked Mar 23 '21 at 12:19

kiran

11
3

1

vote

1 answer

Data migration from S3 to RDS

I am working on a requirement, where i am doing multipart upload of the csv file from on prem server to S3 Bucket. To achieve this using AWS Lambda I create a presigned url and use this url i am uploading the csv file. Now, once i have the file in…

amazon-web-services amazon-s3 aws-lambda aws-dms aws-data-pipeline

asked Jun 10 '20 at 07:41

Nimmo

105
10

1

vote

0 answers

Which file format is suitable for unstructured data?

I am creating a data-repository more like creating data-lake for no-SQL db. I have some field which doesn't have a proper schema. They have mix type object like field value is {a:2} or {b:2,c:4, a: {1,2}}, etc. I can use CSV format so I can save…

amazon-s3 data-lake aws-data-pipeline

asked Apr 09 '20 at 09:39

Manish Trivedi

3,481
5
23
29

1

vote

1 answer

AWS Data Pipeline: Upload CSV file from S3 to DynamoDB

I'm attempting to migrate CSV data from S3 to DynamoDB using Data Pipeline. The data is not in a DynamoDB export format but instead in a normal CSV. I understand that Data Pipeline is more typically used as import or export of DynamoDB format rather…

amazon-data-pipeline aws-data-pipeline

asked Mar 11 '20 at 19:37

Mike S.

185
2
9

1

vote

1 answer

Airflow - Tasks that write files locally (GCS)

I'm in the process of building a few pipelines in Airflow after having spent the last few years using AWS DataPipeline. I have a couple questions I'm foggy on and hope for some clarification. For context, I'm using Google Cloud Composer. In…

airflow amazon-data-pipeline google-cloud-composer aws-data-pipeline

asked Dec 23 '19 at 20:49

JW2

349
2
16

1

vote

0 answers

Is there a way PigActivity in AWS Pipeline can read schema from Athena tables created on S3 buckets

I have lot of legacy pig scripts that run on on-prem cluster, we are trying to move to AWS Data Pipeline (PigActivity) and want to make these pig scripts can read data from S3 buckets where my source data would reside. On-Prem Pig scripts use…

amazon-s3 apache-pig amazon-athena aws-data-pipeline

asked Sep 04 '19 at 02:34

manojd7sto

61
4

1

vote

0 answers

ShellCommandActivity timing out despite setting 3 hours as the timeout value

I'm using a cloudformation template to spin up a EC2 instance to execute a shell script. For the EC2 resource, I've specified the terminateAfter value as 3 Hours. Similarly, for the ShellCommandActivity I've specified the attemptTimeout value as 3…

aws-data-pipeline

asked May 09 '19 at 14:07

user795028

113
10

1

vote

1 answer

Has anyone used AWS system manager parameter in data pipeline, to allocate value to a parameter in pipeline?

"id": "myS3Bucket", "type": "String", "default": "\"aws ssm get-parameters --names variable --query \"Parameters[*].{myS3Bucket:Value}\"\"" I tried this , Where I created a variable in AWS parameter and was able to retrieve the value using this…

aws-data-pipeline

asked Mar 14 '19 at 18:54

user11204973

51
3

1

vote

2 answers

Spark Streaming scheduling best practices

We have a spark streaming job that runs every 30 mins and takes 15s to complete the job. What are the suggested best practices in this scenarios. I am thinking I can schedule AWS datapipeline to run every 30 mins so that EMR terminates after 15…

pyspark spark-streaming amazon-emr amazon-kinesis aws-data-pipeline

asked Feb 03 '19 at 02:58

RockerZ

91
8

1

vote

1 answer

Processing parameters passed to SQL activity in AWS data pipeline

I am working with AWS data pipeline. In this context, I am passing several parameters from pipeline definition to sql file as follows: s3://reporting/preprocess.sql,-d,RUN_DATE=#{@scheduledStartTime.format('YYYYMMdd')}" My sql file looks like…

amazon-web-services hive hiveql aws-data-pipeline

asked Jan 22 '19 at 10:40

Joy

4,197
14
61
131

1

vote

0 answers

How to run multiple steps in aws data pipeline using aws console

I have a use case of scheduling my spark jobs on EMR. Every time we will be spinning a new cluster and running spark job. I went through documentation provided by aws but those are not extensive enough to give clear picture of how to do it. If any…

amazon-emr aws-data-pipeline

asked Oct 31 '18 at 06:38

Raghav salotra

820
1
11
23

1

vote

1 answer

Unresolved resource dependencies [DefaultSchedule] in the Resources block of the template

I am working with the cloudformation script to create AWS Data Pipeline. I have created the script according to the documentation but I am facing 1 error i.e. Template validation error: Template format error: Unresolved resource dependencies…

amazon-web-services aws-cloudformation aws-data-pipeline

asked Aug 08 '18 at 17:34

Achal

133
2
12

1

vote

1 answer

AWS Datapipeline incorrect java version

I am trying to execute a jar file in my datapipeline and it is erroring out in a fashion that indicates to me that the version of java that is installed in my pipeline is lower than that required by the executable jar. I have tried to add a command…

java amazon-web-services amazon-data-pipeline data-pipeline aws-data-pipeline

asked Aug 06 '18 at 15:44

Matt

31
2

Questions tagged [aws-data-pipeline]