Questions tagged [google-cloud-data-fusion]

Google Cloud Data Fusion is a fully managed, cloud-native data integration service that helps users efficiently build and manage ETL/ELT data pipelines. Data Fusion has a visual point-and-click interface, transformation blueprints, and connectors to make ETL pipeline development fast and easy. Cloud Data Fusion is based on the open-source CDAP project.

Google Cloud Data Fusion is a fully managed, cloud-native data integration service that helps users efficiently build and manage ETL/ELT data pipelines. Data Fusion has a visual point-and-click interface, transformation blueprints, and connectors to make ETL pipeline development fast and easy. Cloud Data Fusion is based on the open-source CDAP project.

This tag can be added to any questions related to using/troubleshooting Google Cloud Data Fusion.

Useful links:

445 questions
3
votes
3 answers

Google Cloud Data Fusion: How to change datatype from string to date?

Does anyone know how to convert a string to date in Data Fusion so that it writes to the target as 'Date' instead of string? We are using Data Fusion to consume a csv from GCS (Google Cloud Storage). Data Fusion detects all fields as string, we'd…
TechNewbie
  • 164
  • 2
  • 15
3
votes
2 answers

How to edit an already published Cloud Data Fusion Pipeline

I have deployed a data pipeline in Google Cloud Data Fusion but it does not work as expected. Is there a way to edit an already deployed data pipeline in Cloud Data Fusion or must it be deleted and rebuilt from scratch and deployed again?
Terence Keys
  • 33
  • 1
  • 3
3
votes
2 answers

Failed to connect with mysql using google data fusion

I failed to connect to MySQL from google data fusion the step: First, I add the connector https://dev.mysql.com/downloads/file/?id=462850 Second, I try to add a connection (failed) screenshot of the MySQL: Communications link failure The last…
hanane
  • 63
  • 5
3
votes
1 answer

Loading many tables in Cloud Data Fusion fails with DAG error

I have an MS SQL Server data source with around 1000 tables, which I need to put into BigQuery. I was hoping to use Data Fusion to load them all into staging tables in BigQuery, and then perform transformations on them afterwards. However, as soon…
Bjoern
  • 433
  • 3
  • 16
3
votes
1 answer

Fail to start program run program_run

The source of the error: io.cdap.cdap.internal.app.runtime.distributed.remote.RemoteExecutionTwillRunnerService#543-runtime-startup-1 The error message: java.io.IOException: com.jcraft.jsch.JSchException: java.net.ConnectException: Connection…
JY2k
  • 2,879
  • 1
  • 31
  • 60
3
votes
1 answer

Getting connection timeout error when running simple data fusion pipeline to export data from Bigquery and write to GCS

I am trying to use google Data fusion service, i created a simple pipeline which will extract data from BigQuery and load that data back to GCS in json formate but when i am running the pipeline i am getting the below error. java.io.IOException:…
Mustaquim
  • 103
  • 5
3
votes
2 answers

Appending incremental data in Bigquery from MySql using Cloud Data Fusion

I want to schedule a pipeline that transfers MySql data to Google Bigquery, but the complete data gets appended to the old table in BQ, I want only the incremental data to be appended...
3
votes
1 answer

Deployed jobs stopped working with an image error?

In the last few hours I am no longer able to execute deployed Data Fusion pipeline jobs - they just end in an error state almost instantly. I can run the jobs in Preview mode, but when trying to run deployed jobs this error appears in the…
Jamie
  • 33
  • 2
3
votes
3 answers

Cloud Data Fusion Wrangler stuck on enabling

trying to play with Data Fusion but the Wrangler is stuck on enabling: Looking at the Dashboard, the Dataprep Service status is red: Log: 2019-04-12 11:23:32,923 - DEBUG [provisioning-service-12:i.c.c.i.p.t.ProvisioningTask@75] - Starting…
PowdyPowPow
  • 122
  • 5
3
votes
1 answer

Cloud Data Fusion storagebucketslist permission issue

I just installed Cloud Data Fusion, and get this error when I try to explore the “Cloud Storage Default” bucket. How do I fix this? cloud-datafusion-management-sa@xxxxxxxxxxxx-tp.iam.gserviceaccount.com does not have storage.buckets.list access to…
James
  • 2,321
  • 14
  • 30
2
votes
1 answer

GCP Pipeline Datafusion Respond code: 302

while running a data injection pipeline I get the following error : java.io.IOException: Failed to send message for program run program_run:default.test-to-gcs.-SNAPSHOT.workflow.DataPipelineWorkflow.3 to…
2
votes
0 answers

Workers Dataproc stops unexpectedly

We're launching a replication instance to replicate data from Mysql to BigQuery. After some hours, the instance is in Killed state. When watching the logs, we have that (I added the 3 comments): // DATA IS CORRECTLY LOADED IN BIGQUERY 2022-08-30…
bjovanov
  • 471
  • 4
  • 13
2
votes
1 answer

How to trigger a CDAP pipeline using airflow operators?

I have an onpremise CDAP data fusion instance with multiple namespaces. How to trigger the pipeline using airflow operators? I have tried exploring the airflow available operators and this page but not very helpful…
2
votes
1 answer

Terraform Data Fusion instance changed causes ERROR to occur during plan

So consider the scenario where I have a Data Fusion in version 6.4.1 and I wish to re-deploy it as 6.5.0 version via Terraform (this is just an example, but the problem applies to any update to the Data Fusion instance). In Terraform, this implies…
FVCC
  • 262
  • 2
  • 16
2
votes
2 answers

Getting 403 when trying to connect to Cloud SQL instance

I'm trying to create a connection from Data Fusion to Cloud SQL Postgres. I'm stuck with connection error, and no idea how to solve it. Here's what I have done so far: Datacloud API enabled Data Fusion instance is created with private IP…
1 2
3
29 30