Questions tagged [google-cloud-data-fusion]

Google Cloud Data Fusion is a fully managed, cloud-native data integration service that helps users efficiently build and manage ETL/ELT data pipelines. Data Fusion has a visual point-and-click interface, transformation blueprints, and connectors to make ETL pipeline development fast and easy. Cloud Data Fusion is based on the open-source CDAP project.

Google Cloud Data Fusion is a fully managed, cloud-native data integration service that helps users efficiently build and manage ETL/ELT data pipelines. Data Fusion has a visual point-and-click interface, transformation blueprints, and connectors to make ETL pipeline development fast and easy. Cloud Data Fusion is based on the open-source CDAP project.

This tag can be added to any questions related to using/troubleshooting Google Cloud Data Fusion.

Useful links:

445 questions
0
votes
1 answer

Can we use jdbc driver to read from mariaDB and SAP-HANA using data Fusion

I want to read data from MariaDB and SAP-HANA and load in BigQuery using Data Fusion. Is it possible to read using jdbc driver?
0
votes
1 answer

how to pass the output of a node as a property-variable to next node, in Data Fusion Studio

QS 1: How to read a configuration table/file which contain basic properties of a nodes {ex: source and sink table names etc.} and use that output in the next nodes in FUSION pipeline. tried to use the Remote Program Executor, with command as - bq…
0
votes
1 answer

Unable to connect Cloud SQL mySql instance from Data Fusion. Exception "Could not create socket factory 'com.google.cloud.sql.mysql.SocketFactory"

Facing exception "Could not create socket factory 'com.google.cloud.sql.mysql.SocketFactory' due to underlying exception." when trying to connect with mysql instance in google cloud data fusion. created cloud data fusion instance From…
NitinM
  • 303
  • 4
  • 12
0
votes
1 answer

Trying to upload the latest plugin for Google Cloud to Data Fusion but getting an error while uploading

Reference to this post that I had earlier: Possible to modify or delete rows from a table in BigQuery dataset with a Cloud Data Fusion pipeline? I am trying to do the suggested answer to compile the latest version of Google Cloud Platform plugin and…
Bluescrod
  • 81
  • 1
  • 7
0
votes
1 answer

How can I read multiple csv files from a gcs location, append them ( i.e. stack them ) write them back to another gcs location using DATA FUSION?

Why Data Fusion, well coz I need to run several more steps ( run Data Proc clusters ) , insert to DBs and do it in a schedule. Also the data could explode ( 10s of TB ) or shrink ( 10s of GBs).
Gaurav Taneja
  • 1,084
  • 1
  • 8
  • 19
0
votes
0 answers

Run exported Google Cloud Data Fusion pipeline

I have exported a Cloud Data Fusion pipeline. How can i trigger the job on a cluster whenever I want? I tried to find it in the documentation but couldn't get anywhere.
0
votes
1 answer

When using real time pipeline, unable to feed data to bigquery from gcs

I have developed a real time pipeline in data fusion to fetch data from pubsub and then feed into GCS and thereafter in BQ. However, after GCS (which is available as a sink), i am not able to feed the data into BQ because GCS is only available as a…
0
votes
1 answer

Error establishing connection with local DB and google cloud SQL using data fusion

Need to create a pipeline to export data from local PostgreSQL DB to Google Cloud SQL using Google Cloud DataFusion. Using wrangler to first test the connections with local DB and CloudSQL. While trying to establish a connection with local DB, I am…
0
votes
1 answer

Implement SCD2 logic in Data Fusion and BigQuery

I am trying to implement an SCD2 table loading with Data Fusion but can't seem to find the necessary building block to do it. (something that was presented here). I could join the new records (stage table) with the target table, filter the…
pmatthew
  • 23
  • 7
0
votes
2 answers

Unable to connect Cloud SQL mySql / postgreSQL instance from Data Fusion

Goal is to connect Cloud SQL mysql or postgreSQL instances using Cloud Data Fusion. created Cloud SQL instances with MySQL and postgreSQL created Cloud Data Fusion instance From wrangler > Add connection > Cloud SQL MySQL Added Data Fusion…
0
votes
1 answer

How to fix "java.lang.NullPointerException: null" when doing MSSQL to BigQuery in Cloud Data Fusion

I'm working on a Cloud Data Fusion POC, and I'm attempting to create a MSSQL to BigQuery pipeline. The connection works due to the fact that I'm able to import my schema from a query, however, I'm getting a MapReduce Program "phase-1" failed with a…
0
votes
1 answer

CDAP ingestion from PubSub

I'm trying to load data from PubSub messages to GCS files. Simple pipeline: PubSub source -> JSON Parser -> GCS sink. Since PubSub only accept the data argument as utf-8, how can I decode it in CDAP? Should I build a custom plugin implementing a…
0
votes
1 answer

Failed to grant privileges to a Data Fusion Service Account when the project resides under an organisation

I would like to create a Data Fusion instance and grant the service account privileges to read and write to BigQuery. I am using the Beta version of Data Fusion and my project resides under an organisation. gcloud services enable…
Jeff Moszuti
  • 63
  • 1
  • 6
0
votes
1 answer

Creating a Google Cloud Data Fusion instance does not create the service account

I have created a Google Cloud Data Fusion instance, and per the documentation I am searching for the service account listed to add the additional role. However, this service account is nowhere to be found in the IAM of the project. Am I expected to…
gae123
  • 8,589
  • 3
  • 35
  • 40
0
votes
1 answer

Oracle standard edition and realtime pipeline

As mentionned in Oracle documention: https://docs.oracle.com/cd/B28359_01/license.111/b28287/editions.htm#DBLIC116, Asynchronous change data capture is not available in Oracle Standard Edition. So the question is : is "realtime pipeline" (vs batch…
fml
  • 1
1 2 3
29
30