Questions tagged [google-cloud-data-fusion]

Google Cloud Data Fusion is a fully managed, cloud-native data integration service that helps users efficiently build and manage ETL/ELT data pipelines. Data Fusion has a visual point-and-click interface, transformation blueprints, and connectors to make ETL pipeline development fast and easy. Cloud Data Fusion is based on the open-source CDAP project.

Google Cloud Data Fusion is a fully managed, cloud-native data integration service that helps users efficiently build and manage ETL/ELT data pipelines. Data Fusion has a visual point-and-click interface, transformation blueprints, and connectors to make ETL pipeline development fast and easy. Cloud Data Fusion is based on the open-source CDAP project.

This tag can be added to any questions related to using/troubleshooting Google Cloud Data Fusion.

Useful links:

445 questions
4
votes
1 answer

Connecting to Cloud SQL MySQL

We would like to test connecting Cloud SQL (mySQL) to BigQuery using Cloud Data Fusion. What is the proper way to connect to CloudSQL as that does not appear to be "build in" at this point in time. What driver is recommended and are there any…
Greg
  • 41
  • 3
4
votes
2 answers

Perform custom SQL query with Google Cloud Data Fusion

I have data pipelines that consist of multiple SQL queries being run against BigQuery tables, I would like to build these in Google Cloud Fusion, but I don't see an option to transform/select with custom SQL. is this available, or am I…
Ben P
  • 3,267
  • 4
  • 26
  • 53
3
votes
1 answer

Cloud Data Fusion vs Dataproc

Cloud Data Fusion offers the ability to create ETL jobs using their graphical pipeline UI representation whereas Dataproc lets us run previously created Spark/Hadoop/Hive jobs. With my limited experience in both these services, I have found Cloud…
3
votes
1 answer

Array issue in Data Fusion

I'm currently in the process of integrating MongoDB database into BigQuery through Data Fusion and I'm facing an issue with array objects. It seems like Data Fusion doesn't understand or support such data types. However it seems to be a feature that…
3
votes
1 answer

Data Fusion - Issue with http post plugin

I am trying to make a http call using DataFusion. Source - GCS - csv file Sink - HTTP POST API is expecting the file as part of the HTTP request. When this is executed, I get the below error in the API logs. Required request part 'file' is not…
3
votes
1 answer

Using Google Sheets Source Plugin in GCP Datafusion, gives a 403 Forbidden error in Directory Identifier, service account

Need to pull a Google sheet through Datafusion. There is some documentation but it does not provide a practical example of how to configure the fields. Currently I am receiving a 403 Forbidden error in Directory Identifier, service account details…
RaptorX
  • 113
  • 10
3
votes
2 answers

How to connect Data Fusion to Cloud SQL Proxy

I'm on a journey trying to connect Data Fusion with Cloud SQL MySQL with private IP. I've read many ressources and it seems that it is possible (at least I'm still not convinced that it is not possible). What I have so far: a Data Fusion private…
3
votes
5 answers

Dataproc operation failure: INVALID_ARGUMENT: User not authorized to act as service account

I'm tring to run a pipeline from Cloud Data Fusion, but im receiving the following error: io.cdap.cdap.runtime.spi.provisioner.dataproc.DataprocRuntimeException: Dataproc operation failure: INVALID_ARGUMENT: User not authorized to act as service…
lucas.coelho
  • 894
  • 1
  • 9
  • 16
3
votes
1 answer

Cloud Data Fusion Oracle Source Preview Error

I have a question for clarification as well as a 2 errors using cloud data fusion: Background: Creating a pipeline to move data from a single table in Oracle (version 11.2.0.4, local server) into BigQuery using cloud data fusion. I have downloaded…
3
votes
3 answers

GCP Data Fusion StatusRuntimeException: INVALID_ARGUMENT: Insufficient 'DISKS_TOTAL_GB' quota. Requested 3000.0, available 2048.0

I'm trying to deploy a pipeline in GCP Data Fusion. I was initially working on the free account, but upgraded in order to increase quotas as recommended in the following question seen here. However, I am still unclear based on the accepted answer as…
3
votes
1 answer

Using a multi-character delimiter in Cloud Data fusion

I am trying to read a csv file in cloud datafusion. The csv file uses a multi-character (i.e. ~^~)delimiter. When i try to parse the column using a custom delimiter the tool only considers the first character and splits the file accordingly. I end…
Trishit Ghosh
  • 235
  • 3
  • 10
3
votes
2 answers

Google data fusion Execution error "INVALID_ARGUMENT: Insufficient 'DISKS_TOTAL_GB' quota. Requested 3000.0, available 2048.0."

I am trying load a Simple CSV file from GCS to BQ using Google Data Fusion Free version. The pipeline is failing with error . it reads com.google.api.gax.rpc.InvalidArgumentException: io.grpc.StatusRuntimeException: INVALID_ARGUMENT: Insufficient…
3
votes
3 answers

Access CDAP Rest API of a Cloud Data Fusion Instance

How do you access the CDAP REST API of a Cloud Data Fusion instance? I would like to use Cloud Composer to orchestrate my pipelines. I have an Enterprise Edition instance with private IP enabled, but i'm not able to find any documentation on how…
3
votes
1 answer

Whitelist AWS RDS connection to Google Cloud Data Fusion

We have a Google Cloud Data Fusion instance that needs to connect to AWS RDS to pull data from it. The only problem is that we cannot whitelist the port 1433 to the world to make a connection to Google Cloud Data Fusion. How can we make Google Cloud…
Raman
  • 1,221
  • 13
  • 20
3
votes
2 answers

Import/Export DataFusion pipelines

Does anyone know if it is possible to programmatically import/export DataFlow pipelines (deployed or in draft status)? The idea is to write a script to drop and create a DataFusion instance, in order to avoid billing when it's not used. Via gloud…
1
2
3
29 30