Questions tagged [data-pipeline]

168 questions
0
votes
0 answers

Are there any docs on how you envision the data engineer workflows looks like with Mage?

We run several environments with different data residency so would like to follow a declarative gitops workflow if possible. Essentially data engineers writes new pipeline in mage and creates a PR, the PR gets merge and deployed to all Mage…
datacated
  • 1
  • 2
0
votes
0 answers

i am getting error at pushing data to a pgSQL server in Luigi even-thou if i run the python script it works completely fine

I was building a data pipeline that gets json data from a url and changes it into csv format than i clean the data a bit and then push data to a sql server Everything works of except for last task i am getting this error RuntimeError: Unfulfilled…
0
votes
0 answers

Airflow map all tasks dependencies

I'm trying to find a way to extract all tasks dependencies. The idea is to find all SQL tasks (Bigquery) and find all it depending tables so i guess there is sort of a metadata db or another option i could think of is reading the "Render" (render…
kncdwn
  • 1
  • 1
0
votes
0 answers

Cloud data fusion Parameterized pipeline using argument setter and wrangler

Is there a way to create reusable data fusion pipeline that can handle multiple table transformations. Example: I have 2 tables in BigQuery dataset in raw format and I would like to create data fusion pipeline and load transformed data in another…
0
votes
1 answer

Is there a configuration in Kafka to write multiple records to one S3 object?

I'm using an S3 Sink Connector to write records to S3 from Kafka. Eventually I will be using Kafka to capture CDC Packets from my Database and then writing these packets to S3. However, I don't want every single CDC Packet, which in Kafka will be a…
0
votes
1 answer

How to generate kedro pipelines automatically (like DataEngineerOne does)?

Having seen the video of DataEngineerOne: How To Use a Parameter Range to Generate Pipelines Automatically I want to automate a pipeline that simulates an electronic circuit. I want to do a grid search over multiple central frequencies of a bandpass…
ilja
  • 109
  • 7
0
votes
3 answers

How to run a kedro pipeline interactively like a function

I would like to run kedro pipelines in jupyter notebook with different inputs, so something like this: data = catalog.load('my_dataset') params = catalog.load('params:my_params') pipelines['my_pipeline'](data=my_dataset, params=my_params) Is there…
ilja
  • 109
  • 7
0
votes
0 answers

"Kibana server is not ready yet" browser message with Kibana 8.2.0

I need to access Kibana dashboard without any type of login/authentication. I'm only interested in passing data to learn this technologies. when I try to access in Kibana dashboard, receive this message from browser Kibana server is not ready…
0
votes
1 answer

"Error: Forbidden" even though service account has function permission access

I am trying to deploy a data ingestion pipeline in Google Cloud Functions. When I trigger the URL, I get the following error: Error: Forbidden Your client does not have permission to get URL /entry-point from this server. I don't understand when…
0
votes
1 answer

Is there any way where we can generate output file based on the input data in benthos?

For example: Input Data: {"date":"03-11-22", "message":"This is message"}, {"date":"03-30-22", "message":"This is message"}, {"date":"04-03-22", "message":"This is message"}, {"date":"04-15-22", "message":"This is message"}, {"date":"08-18-22",…
Yash Chauhan
  • 174
  • 13
0
votes
1 answer

Cannot run AWS Data Pipeline job due to ListObjectsV2 operation: Access Denied

I've written some CDK code to programmatically create a data pipeline that backs up a DynamoDB table into an S3 bucket on a daily basis. But it keeps running into this error: amazonaws.datapipeline.taskrunner.TaskExecutionException: Failed to…
0
votes
1 answer

What Python libraries to use in Data Pipeline?

I have a .csv in PowerBI and I need to automate a process to do daily uploads to BigQuery. First of all, what python libraries should I keep in mind to develop a project like this, I don´t know where to look. Thanks.
0
votes
0 answers

BigQuery Multi Table has no outputs. Please check that the sink calls addOutput at some point error from Multiple database table plugin

I'm trying to ingest data from different tables with in same database using Data fusion Multiple database tables plugin to bigquery tables using multiple big query tables sink. I write 3 different custom SQL and add them inside the plugin section…
0
votes
1 answer

Handling multiple inputs for command in Snakemake

I'm currently working on a project that involves me using snakemake to run svaba, a variant caller, on genome data. svaba run can take multiple sample files but requires a flag in front of each file. For example: svaba -g.... -t s1.bam -t s2.bam -t…
James
  • 13
  • 2
0
votes
1 answer

İs it possible to create data pipeline with Google Cloud Data fusion using multiple database tables Update or Upsert?

After using multiple database tables plugin and load data to bigquery I would like to make incremental load for every table in one data pipeline. I wonder if I can use UPSERT on multiple database tables plugin. How can I overcome any advice ?
wweer
  • 5
  • 3