Highest Voted 'data-pipeline' Questions

1

vote

0 answers

batch_size and max_time in LSTM Tensorflow

Background: I am trying to model multi-layered LSTM in tensorflow. I am using a common function to unroll the LSTM: tf.nn.dynamic_rnn Here I am using time_major = True, so my data has to be of format [max_time, batch_size, depth]. According to my…

asked Feb 11 '19 at 11:51

Praveen

267
1
5
19

1

vote

1 answer

Kinesis triggers lambda with small batch size

I have a Lambda which is configured as a consumer of a Kinesis data stream, with a batch size of 10,000 (maximal). The lambda parses given records and inserts them to Aurora Postgresql (using an INSERT command). Somehow, I see that the lambda is…

aws-lambda amazon-kinesis data-pipeline

asked Dec 05 '18 at 20:28

Ronyis

1,803
16
17

1

vote

0 answers

Incremental update of the data in AWS S3

Incremental update of S3 buckets without natural keys I need to design an etl flow. OLTP systems are sharing customer, product, campaign and sales record via files. I want to transfer these files incrementally into Aws S3 buckets. Assume that I…

amazon-s3 etl data-pipeline

asked Oct 22 '18 at 10:57

user125687

85
1
4
15

1

vote

1 answer

AWS Datapipeline incorrect java version

I am trying to execute a jar file in my datapipeline and it is erroring out in a fashion that indicates to me that the version of java that is installed in my pipeline is lower than that required by the executable jar. I have tried to add a command…

java amazon-web-services amazon-data-pipeline data-pipeline aws-data-pipeline

asked Aug 06 '18 at 15:44

Matt

31
2

1

vote

1 answer

Airflow supported cross data center?

We would like to use Apache Airflow to orchestrate work across global data centers (regions). From what I can tell the only way to make this work is to give access/permission to all tasks to write directly to some cloud exposed database. Does…

airflow data-pipeline

asked May 24 '18 at 20:12

hhop

33
3

1

vote

1 answer

"Connection timed out (Connection timed out)" Error for SQLActivity

I have a connection timed out error on my data pipeline job to run a simple sql script. The script is set up in my S3. The data pipeline itself is in the region of us-east-1. My database is in us-east-2. When I first ran the pipeline I got the error…

amazon-web-services amazon-ec2 amazon-rds amazon-data-pipeline data-pipeline

asked Feb 14 '18 at 04:35

Berra2k

318
2
5
16

1

vote

1 answer

Checking status of AWS Data Pipeline using Go SDK

Situation: I have 2 data pipelines that run on-demand. Pipeline B cannot run until Pipeline A has completed. I'm trying to automate running both pipelines in a single script/program but I'm unsure how to do all of this in Go. I have some Go code…

amazon-web-services go amazon-data-pipeline data-pipeline

asked Sep 12 '17 at 19:32

the1337beauty

251
2
7

1

vote

0 answers

Why does my Cloudformation Data Pipeline fail on my Ec2Resource?

I’m trying to run a Data Pipeline inside a cloud formation stack. This stack references the exports of another stack which contains a Redshift cluster. When I run it, I get an error stating " 'Ec2Instance', errors = Internal error during validation…

amazon-data-pipeline aws-cloudformation data-pipeline

asked Jul 14 '17 at 18:56

Taran

11
2

1

vote

1 answer

Time Series Windowing for streaming applications

we are developing data pipeline app using Kafka, storm and redis. Realtime events from different systems will be published to Kafka and storm do the event processing based on rules configured. State is managed in redis. we have a requirement to…

redis apache-storm complex-event-processing data-pipeline data-lake

asked Jul 05 '17 at 11:30

shatk

465
5
16

1

vote

1 answer

What is the best way to automate replication of RDS (MySQL) schema to AWS Redshift?

We use ruby scripts to migrate data from MySQL to Redshift(PostgreSQL).Currently we use YAML configuration files to maintain schema information (column names and types).So whenever a MySQL table is altered, we need to manually change the YAML…

mysql database amazon-redshift data-migration data-pipeline

asked Mar 17 '17 at 21:18

Himanshu Kansal

510
7
17

1

vote

0 answers

Is there any blueprint for a data-pipeline?

I use Spark for data processing, but starting from the datasources (mostly csv files) I would like to put in place a data-pipeline which has right stages to control/test/manipulate data and deploy them to different "stages"…

apache-spark continuous-delivery data-pipeline

asked Aug 21 '16 at 16:28

Randomize

8,651
18
78
133

0

votes

0 answers

PyTest for DataPipelines

My project is based on collecting infrastructure metrics like CPU, Memory, Disk, Hits for various servers and applications etc via Splunk RestAPI, HTTP API calls and shell scripts. The python code written is procedural in nature. I need to implement…

python pytest data-pipeline

asked Jul 31 '23 at 13:13

Yavnica Saini

33
1
1
8

0

votes

0 answers

Kafka, Kafka-Connect and HDFS with docker compose

I am slowly working my way into the world of docker compose. I would like to create a data pipeline. I think something is not working with my connector?! The CSV I send later to Kafka is read in. But then the data from the connector is not sent to…

docker docker-compose yaml data-pipeline

asked Jul 16 '23 at 16:57

Mauz

1
1

0

votes

0 answers

Is it possible to ALTER MULTIPLE HIVE VIEWS at once? (ACID-like schema changes?)

I have a "private" Hive database filled with 24 tables of data populated by externally located parquets in part of a Spark data pipeline. I have a "public" Hive database intended for public (downstream) usage with 24 views selecting content out of…

apache-spark hive data-pipeline

asked Jun 21 '23 at 19:45

Rimer

2,054
6
28
43

0

votes

2 answers

How to execute Step based on condition in Tranformation Level in Pentaho?

I know that I can use condition executing at Job Level like below But I want to use condition executing at Tranformation level. For example I have a simple Table Input step, which have a query like "select id from tableA". Now based on the value of…

pentaho pentaho-data-integration data-pipeline

asked Jun 07 '23 at 07:32

Hoang Minh Quang FX15045

733
1
4
15

Questions tagged [data-pipeline]