Questions tagged [data-pipeline]
168 questions
0
votes
2 answers
How can I schedule python script in the cloud?
I am developing a python script that downloads some excel files from a web service. These two files are combined with another one stored in my computer locally to produce the final file. This final file is loaded to some database and PowerBI…
0
votes
0 answers
Not able to connect apache flink to NIFI source using nifi-flink connector
I want to transfer the files from local system to flink using nifi. I have configured pipeline in nifi with GetFile processor and output port with name "Data For Flink". On Flink end i am using the flink-connector-nifi_2.11 with flink. Below is…

vishal
- 1
- 1
0
votes
1 answer
How to implement recursive algorithms in Apache Spark?
I have a problem where I want to implement a recursive algorithm in Spark, and looking to see if there are any recommendations for building this in Spark, or exploring other data analytics frameworks that might be better suited.
eg. The job needs to…

Nikhil Kothari
- 5,215
- 2
- 22
- 28
0
votes
0 answers
Data Pipeline: URL request in Google Cloud Function ends with "crash" on VPC Connector
I am having a small problem with my Cloud Function that crashes the message is
Function execution took 242323 ms, finished with status: 'crash'
My Setup
There are two GCP projects set up, One is managed by Department A, I work in Department B and…

Totta Zetterlund
- 313
- 5
- 18
0
votes
1 answer
GCP Data fusion transfer multiples from Azure storage to Google Storage
I am Trying to transfer multiple (.csv) files under a directory from Azure storage container to Google storage (as .txt files)through data fusion.
From Data fusion, I can successfully transfer single file and converting it to .txt file as part of…

Srini V
- 65
- 1
- 1
- 8
0
votes
1 answer
Why is the task status in DolphinScheduler always in the successfully submitted status?
when I click the Start button to run the workflow, I meet the following situation: the task status always in the successfully submitted status, how can I solve this problem?

David
- 51
- 5
0
votes
0 answers
GCP Data Fusion Azure blob storage configuration Transfer multiple files
I am trying to transfer multiple csv files from the Azure storage container to the GCP bucket through the Data fusion pipeline.
I can successfully able to transfer a single file by mentioning the below path (full path for specific CSV file) for the…

Srini V
- 65
- 1
- 1
- 8
0
votes
1 answer
Data Pipeline using SQl and Python
I need to create a data pipeline using Python. I want to connect with MySql in Python and read the tables in dataframes, perform pre-processing and then load the data back to Mysql Db. I was able to connect to the MySql Db using mysql connector and…

Disha09
- 11
- 1
0
votes
1 answer
insert into SQL Server table using python from CSV and Text file
I am trying to insert data from a CSV file and also from a textfile into SQL SERVER SSMS version 18.7. Below is my code.
import pyodbc
import csv
conn = pyodbc.connect('Driver={SQL Server};'
'Server=????;'
…

nikhil davis
- 35
- 7
0
votes
1 answer
AWS Datapipeline RDS to S3 Activity Error: Unable to establish connection to jdbc://mysql:
I am currently setting up a AWS Data Pipeline using the RDStoRedshift Template. During the first RDStoS3Copy activity I am receiving the following error:
"[ERROR]…

kasey
- 1
- 1
0
votes
1 answer
Generate a progressive number when new record are inserted (some record need to have the same number)
the Title can be a little confused. Let me explain the problem. I have a pipeline that loads new record daily. This record contain sales. The key is . This data are loaded into a redshift table and than are exposed…

SGiux
- 619
- 3
- 10
- 34
0
votes
1 answer
Airflow 1.10.13, 2020-11-24 issues after update with pip install
Below is the configuration which worked till December 1st
composer-1.11.2-airflow-1.10.6
Python – 3.6
'dbt==0.17.0',
'google-cloud-storage',
'google-cloud-secret-manager==1.0.0',
'protobuf==3.12.2'
With the above configuration we are observing…

user2640679
- 54
- 3
0
votes
2 answers
DynamoDB data load after transforming files. Any AWS service like GCP Dataflow/Apache Beam?
New to AWS. I have a requirement to create a daily batch pipeline
Read 6-10 1GB+ CSV Files. (Each file is an extract of a table from a SQL db.)
Transform each file with some logic and join all files to create one item per id.
Load this joined data…

jconnor198
- 13
- 3
0
votes
2 answers
Using a correct datapipeline for CloudSQL to BigQuery
I'm really new in this whole data engineering whilst I'm taking this matter as my thesis project, so bear with me.
I'm currently developing a big data platform for a battery storage system that already has CloudSQL services that collect data every…

Iqbal Jurist
- 43
- 2
0
votes
1 answer
"AssertionError: Unrecognized instruction format" while splitting a dataset using Splits API - Tensorflow2.x
Please read the given problem.
You need to use subsets of the original cats_vs_dogs data, which is entirely in the 'train' split. I.E. 'train' contains 25000 records with 1738 corrupted images to in total you have 23,262 images.
You will split it up…

Fawad
- 1
- 2