Questions tagged [data-pipeline]
168 questions
1
vote
0 answers
batch_size and max_time in LSTM Tensorflow
Background:
I am trying to model multi-layered LSTM in tensorflow. I am using a common function to unroll the LSTM:
tf.nn.dynamic_rnn
Here I am using time_major = True, so my data has to be of format [max_time, batch_size, depth].
According to my…

Praveen
- 267
- 1
- 5
- 19
1
vote
1 answer
Kinesis triggers lambda with small batch size
I have a Lambda which is configured as a consumer of a Kinesis data stream, with a batch size of 10,000 (maximal).
The lambda parses given records and inserts them to Aurora Postgresql (using an INSERT command).
Somehow, I see that the lambda is…

Ronyis
- 1,803
- 16
- 17
1
vote
0 answers
Incremental update of the data in AWS S3
Incremental update of S3 buckets without natural keys
I need to design an etl flow. OLTP systems are sharing customer, product, campaign and sales record via files. I want to transfer these files incrementally into Aws S3 buckets.
Assume that I…

user125687
- 85
- 1
- 4
- 15
1
vote
1 answer
AWS Datapipeline incorrect java version
I am trying to execute a jar file in my datapipeline and it is erroring out in a fashion that indicates to me that the version of java that is installed in my pipeline is lower than that required by the executable jar. I have tried to add a command…

Matt
- 31
- 2
1
vote
1 answer
Airflow supported cross data center?
We would like to use Apache Airflow to orchestrate work across global data centers (regions). From what I can tell the only way to make this work is to give access/permission to all tasks to write directly to some cloud exposed database. Does…

hhop
- 33
- 3
1
vote
1 answer
"Connection timed out (Connection timed out)" Error for SQLActivity
I have a connection timed out error on my data pipeline job to run a simple sql script. The script is set up in my S3. The data pipeline itself is in the region of us-east-1. My database is in us-east-2. When I first ran the pipeline I got the error…

Berra2k
- 318
- 2
- 5
- 16
1
vote
1 answer
Checking status of AWS Data Pipeline using Go SDK
Situation: I have 2 data pipelines that run on-demand. Pipeline B cannot run until Pipeline A has completed. I'm trying to automate running both pipelines in a single script/program but I'm unsure how to do all of this in Go.
I have some Go code…

the1337beauty
- 251
- 2
- 7
1
vote
0 answers
Why does my Cloudformation Data Pipeline fail on my Ec2Resource?
I’m trying to run a Data Pipeline inside a cloud formation stack. This stack references the exports of another stack which contains a Redshift cluster. When I run it, I get an error stating " 'Ec2Instance', errors = Internal error during validation…

Taran
- 11
- 2
1
vote
1 answer
Time Series Windowing for streaming applications
we are developing data pipeline app using Kafka, storm and redis. Realtime events from different systems will be published to Kafka and storm do the event processing based on rules configured. State is managed in redis.
we have a requirement to…

shatk
- 465
- 5
- 16
1
vote
1 answer
What is the best way to automate replication of RDS (MySQL) schema to AWS Redshift?
We use ruby scripts to migrate data from MySQL to Redshift(PostgreSQL).Currently we use YAML configuration files to maintain schema information (column names and types).So whenever a MySQL table is altered, we need to manually change the YAML…

Himanshu Kansal
- 510
- 7
- 17
1
vote
0 answers
Is there any blueprint for a data-pipeline?
I use Spark for data processing, but starting from the datasources (mostly csv files) I would like to put in place a data-pipeline which has right stages to control/test/manipulate data and deploy them to different "stages"…

Randomize
- 8,651
- 18
- 78
- 133
0
votes
0 answers
PyTest for DataPipelines
My project is based on collecting infrastructure metrics like CPU, Memory, Disk, Hits for various servers and applications etc via Splunk RestAPI, HTTP API calls and shell scripts.
The python code written is procedural in nature.
I need to implement…

Yavnica Saini
- 33
- 1
- 1
- 8
0
votes
0 answers
Kafka, Kafka-Connect and HDFS with docker compose
I am slowly working my way into the world of docker compose. I would like to create a data pipeline. I think something is not working with my connector?! The CSV I send later to Kafka is read in. But then the data from the connector is not sent to…

Mauz
- 1
- 1
0
votes
0 answers
Is it possible to ALTER MULTIPLE HIVE VIEWS at once? (ACID-like schema changes?)
I have a "private" Hive database filled with 24 tables of data populated by externally located parquets in part of a Spark data pipeline.
I have a "public" Hive database intended for public (downstream) usage with 24 views selecting content out of…

Rimer
- 2,054
- 6
- 28
- 43
0
votes
2 answers
How to execute Step based on condition in Tranformation Level in Pentaho?
I know that I can use condition executing at Job Level like below
But I want to use condition executing at Tranformation level. For example I have a simple Table Input step, which have a query like "select id from tableA". Now based on the value of…

Hoang Minh Quang FX15045
- 733
- 1
- 4
- 15