Highest Voted 'data-pipeline' Questions

0

votes

0 answers

Is it possible to configure dependencies in Azkaban to start a job after completion of either Job A or Job B, without requiring both of them to finish

I have a scenario where I have three jobs in my Azkaban workflow. I want to ensure that Job C starts only after the completion of either Job A or Job B. It doesn't matter which of the two jobs finishes first; as soon as either Job A or Job B…

data-pipeline azkaban

asked May 25 '23 at 15:53

Bora Çolakoğlu

69
2
7

0

votes

1 answer

Data Flow ERROR java.lang.OutOfMemoryError: Java heap space

I have to create the pipeline to transfer the data from BigQuery and save it as json file. But I got this error. The result from sql query is 30 million records. How to improve this code? Error: [error] (run-main-0) java.lang.OutOfMemoryError: Java…

java scala google-cloud-dataflow data-pipeline

asked May 22 '23 at 14:13

P.pp

9
2

0

votes

2 answers

Error: "zsh: no matches found: apache-beam[gcp]" while installing Apache Beam

I am working on a project and trying to install Apache Beam from the terminal using this command: pip3 install apache-beam[gcp] however I get this error: zsh: no matches found: apache-beam[gcp] I created a virtual env using these commands: pip3…

apache-beam data-pipeline

asked Apr 16 '23 at 12:54

Emillia-rosette Nlandu

3
1

0

votes

0 answers

How Result Data Published by Respective Boards are available just after few minutes on other portals

I was wondering how the result published by state bords like Bihar,MP,MH,Jharkhand boards are available on just other result portal like Indiaresults.com. Is there any way to copy whole data without having access to db or server side script. How…

sql database etl amazon-data-pipeline data-pipeline

asked Mar 31 '23 at 08:21

Sanjay Kumar

145
1
1
10

0

votes

1 answer

Tensorflow: How to add a property in execution object in MLMD MetadataStore?

I'm using the MLMD MetadataStore to manage the data pipelines and I need to add an execution property in MLMD to get this property later. I'm trying add with this: from ml_metadata.proto import metadata_store_pb2 from ml_metadata.metadata_store…

tensorflow tfx data-pipeline mlmd

asked Mar 31 '23 at 02:17

natielle

380
3
14

0

votes

0 answers

Migrating Aurora DB cluster to Snowflake and daily incremental refresh

I am looking to migrate multiple Aurora DB clusters around few TB's to Snowflake and performing daily incremental refresh. I am wondering of the best practices and tools for achieving this objective. Should I consider the path of Aurora DB cluster…

database snowflake-cloud-data-platform etl amazon-aurora data-pipeline

asked Mar 30 '23 at 10:57

Code Warrior

133
1
3
15

0

votes

1 answer

Beam pipeline spark runner issue

i have a beam pipeline that reads from a kinesis stream, deserialized protobuf data inside, change to byte array and writes it to another kinesis stream (just a dummy pipeline) This pipeline executes successfully if i run mvn compile exec:java…

apache-spark apache-flink apache-beam data-pipeline

asked Mar 08 '23 at 23:55

Viswajith Kalavapudi

189
1
3
16

0

votes

2 answers

stream data between tasks in pipeline orchestration tool Prefect/Dagster/Airflow

How can I stream data between tasks in a workflow with the help of a data pipeline orchestration tool like Prefect, Dagster or Airflow? I am looking for a good data pipeline orchestration tool. I think I have a fairly decent overview now of what…

airflow data-pipeline prefect dagster

asked Mar 05 '23 at 10:40

phobic

914
10
24

0

votes

0 answers

Airflow Log by Attempts takes too long to show how process is going

My team has developed some pipelines in Airflow and we are really amazed how can we set multiple tasks to run and data flows from sources directly into our datalake. However, we have some complex tasks and logging can take up to 40 minutes to be…

python airflow data-pipeline

asked Feb 20 '23 at 23:27

Nelson Insignares

1

0

votes

1 answer

Azure Data Factory - Retrieve next pagination link (decoded) from response headers in a copy data activity of Azure Data Factory

I have created a copy data activity in azure data factory and this data pipeline pulls the data from an API (via REST activity source) and writes the response body (json) on a file kept in the azure blob storage. The API which I am fetching the…

azure pagination azure-data-factory data-pipeline

asked Feb 01 '23 at 08:37

Rubin Shah

1
1

0

votes

1 answer

How do I trigger Apache Beam side inputs periodically?

I have a Dataflow Pipeline with streaming data, and I am using an Apache Beam Side Input of a bounded data source, which may have updates. How do I trigger a periodic update of this side input? E.g. The side input should be refreshed once every 12…

triggers refresh google-cloud-dataflow apache-beam data-pipeline

asked Jan 27 '23 at 09:52

yeong

1
1

0

votes

0 answers

How to design a CRON job to transfer datas from DigitalOcean database (MySQL) into Google Big Query hourly?

in my workplace I was tasked to do this below work: what is the most cost effective method to create a CRON job that would run hourly (or maybe twice a day) to copy company's application new datas from DigitalOcean database (MySQL) into a Google…

google-bigquery cron digital-ocean data-transfer data-pipeline

asked Jan 23 '23 at 09:12

Jackk-Doe

109
7

0

votes

0 answers

which servies to use for periodically load data from multiple data sources, aggregate and provide fast search?

Please propose a solution design for my case. The data comes from various sources, some from api, some from csv. A user will search using filters. Ex: Product data (source 1) and Product Reviews ( Source 2). A user will search for a product with its…

airflow etl data-pipeline

asked Jan 22 '23 at 08:19

jasy

1
1

0

votes

1 answer

ValueError when running Python function in data pipeline

I'm building a data pipeline using Python and I'm running into an issue when trying to execute a certain function. The error message I'm receiving is: ValueError: Could not convert string to float: 'N/A' Here is the function in question: def…

python amazon-web-services data-pipeline

asked Jan 01 '23 at 15:33

Kingdavid Ochai

15
4

0

votes

0 answers

Azkaban 3.44 conditional flow not working (including example on official documentation)

I'm try to use conditional flows on Azkaban. When I submit/upload my project inside web node I receive this error. Validator Directory Flow reports errors: Error loading flow yaml file sample.flow:Cannot create property=nodes for…

java conditional-statements executor data-pipeline azkaban

asked Dec 22 '22 at 15:13

Holsi

21
3

Questions tagged [data-pipeline]