Questions tagged [google-cloud-data-fusion]

Google Cloud Data Fusion is a fully managed, cloud-native data integration service that helps users efficiently build and manage ETL/ELT data pipelines. Data Fusion has a visual point-and-click interface, transformation blueprints, and connectors to make ETL pipeline development fast and easy. Cloud Data Fusion is based on the open-source CDAP project.

Google Cloud Data Fusion is a fully managed, cloud-native data integration service that helps users efficiently build and manage ETL/ELT data pipelines. Data Fusion has a visual point-and-click interface, transformation blueprints, and connectors to make ETL pipeline development fast and easy. Cloud Data Fusion is based on the open-source CDAP project.

This tag can be added to any questions related to using/troubleshooting Google Cloud Data Fusion.

Useful links:

445 questions
1
vote
2 answers

Replicating data from MySQL to BigQuery using GCP Data Fusion - Getting issue with 'Date' datatype

I wanted to replicate Mysql tables held in GCP Compute Engine to the GC BigQuery. I referred this document : https://cloud.google.com/data-fusion/docs/tutorials/replicating-data/mysql-to-bigquery. so I Decided to use GCP Data Fusion for the…
1
vote
1 answer

How to make GCP Data Fusion MySQL Replication work well with DateTime columns

I managed to have MySQL tables replicated into BigQuery fairly easily by following this article on Cloud Data Fusion Replication. However, there's an issue with the DateTime columns. All the DateTime columns have been replicated into BigQuery using…
1
vote
1 answer

GCP Data Fusion - Can't find Replication section in the menu

I'm trying to follow this article to replicate an on-prem MySQL database to BigQuery. I've setup everything needed up to the "navigate to the Replication page", but I can't find the replication page in the Cloud Data Fusion UI. Is this something I…
Simon Corcos
  • 962
  • 14
  • 31
1
vote
1 answer

Is it possible to get lineage metadata from the pipeline in my Data Fusion Action plugin?

I'm trying to get data lineage metadata like data source/schema and data target/schema in a custom Action plugin which gets executed after the successful run of the other steps in the pipeline. I have a basic Action plugin that executes but I'm…
1
vote
1 answer

Google Cloud Data Fusion MySQL replication job Failed to merge a batch of changes from the staging table

I followed the documentation on https://cloud.google.com/data-fusion/docs/tutorials/replicating-data/mysql-to-bigquery to create a Cloud Data Fusion instance and connect to a MySQL replication instance (running mysql 5.7 in replication mode, reading…
1
vote
1 answer

Cloud Data Fusion problems reading a CSV export with the HTTP source

I am trying Cloud Data Fusion for the first time. I have this endpoint I'd like to consume testwise: https://waidlife.com/backend/export/index/export.csv?feedID=1&hash=4ebfa063359a73c356913df45b3fbe7f (This is a shopware export) The header row tells…
xetra11
  • 7,671
  • 14
  • 84
  • 159
1
vote
0 answers

Common error capture plugin for entire pipeline

Is it possible to have single error capture plugin for a pipeline? And how to route error records to that plugin from all other plugins?
adiideas
  • 25
  • 4
1
vote
1 answer

Cloud Data Fusion - enabling Wrangler

I am unable to enable the wrangler from the Cloud Data Fusion console. I do get to a screen but it looks nothing like what is in the tutorials or documentation. When I enable the wrangler, I get to a screen as shown in the attached image. This…
sacoder
  • 159
  • 13
1
vote
1 answer

CDAP PUBSUB Realtime Pipleine MAP Datatype

Im trying to pull through a pubsub subscription using cdap realtime pipeline. I can connect the pubsub up but the attributes column is coming through as a MAP datatype and I seen unable to do anything with it (I need the data in it). The idea is to…
1
vote
1 answer

GCP Data Fusion Twitter Tweet Stream Error: java.lang.NoClassDefFoundError: org/apache/spark/Logging

I am trying to stream twitter data into a Big Query table using GCP Data Fusion. I've added my twitter credentials to the twitter component and validate with no errors. I also validate the Big Query component with no errors. When I run the preview…
1
vote
1 answer

Datafusion load BQ with XML 2003 worksheet data

I have a system exporting data as XML 2003 Worksheet. I need to load it to Bigquery through datafusion or any other process using GCP resources. So Is it possible to complete this with DataFusion I have followed the process for XML transformation…
1
vote
2 answers

Cannot create a batch pipeline to get data from ZohoCRM with http plugin 1.2.1 to BigQuery. Retuns Spark Program 'phase-1' failed

My first post here and I'm new to Data Fusion and I'm with low to no coding skills. I want to get data from ZohoCRM to BigQuery. Module from ZohoCRM (e.g. accounts, contacts...) to be a separate table in BigQuery. To connect to Zoho CRM I obtained a…
1
vote
0 answers

Provisioning Issues in the DataFusion

When DataFusion runs a data pipeline, it persists in the provisioning state and then stops. As a result, Dataproc cannot be created. Dataproc's settings are as follows: - Master - Number of masters : 1 - Master Cores : 2 - Master…
1
vote
0 answers

Why is it not provisioned when running a data pipeline in a data fusion?

I am using DataFusion Enterprise. Datafusion>system admin>configuration>system compute profiles>create new profile on this route I set the configuration value of Master Node, Worker Node. And I set configuration for each data pipeline. (Executor,…
1
vote
3 answers

Google Cloud Data Fusion is not producing CSV output in GCS Bucket

I have a pipeline that recursively reads many JSON files from a Google Cloud Storage (GCS) bucket, then parses each file into a record. Each record then goes through a "Python Transform" plugin for further processing (adding new fields and values),…