Questions tagged [google-cloud-data-fusion]

Google Cloud Data Fusion is a fully managed, cloud-native data integration service that helps users efficiently build and manage ETL/ELT data pipelines. Data Fusion has a visual point-and-click interface, transformation blueprints, and connectors to make ETL pipeline development fast and easy. Cloud Data Fusion is based on the open-source CDAP project.

Google Cloud Data Fusion is a fully managed, cloud-native data integration service that helps users efficiently build and manage ETL/ELT data pipelines. Data Fusion has a visual point-and-click interface, transformation blueprints, and connectors to make ETL pipeline development fast and easy. Cloud Data Fusion is based on the open-source CDAP project.

This tag can be added to any questions related to using/troubleshooting Google Cloud Data Fusion.

Useful links:

445 questions
1
vote
1 answer

Can I run a exported datafusion pipeline through dataproc directly?

I was having a concurrency issue (running multiple pipelines at the same time) and hence would like to run my exported pipelines (json) directly from dataproc. Is there any tutorial or a script to help me do so? Any help or leads would be…
1
vote
0 answers

CDAP - embebing staring-timestamp logic inside a pipeline

I have a pipeline that makes a request to an API. And I would like to parametrize the start- and end-timestamps in the URL inside the http-plugin. End-timestamp is -2hours from runtime, so it would be kind of easy to use a macro with the function…
Dani
  • 11
  • 1
1
vote
1 answer

How to specify which GCP project to use when triggering a pipeline through Data Fusion operator on Cloud Composer

I need to to trigger a Data Fusion pipeline located on a GCP project called myDataFusionProject through a Data Fusion operator (CloudDataFusionStartPipelineOperator) inside a DAG whose Cloud Composer instance is located on another project called…
1
vote
1 answer

What to pass as system.profile.name when selecting DataFusion profile name to surely fail back to the default profile?

For very specific reasons in our use case we have to pass something as the value of system.profile.name property, when we execute a CloudDataFusionStartPipelineOperator from an Airflow DAG. We use this property for selecting a DataProc cluster for…
elaspog
  • 1,635
  • 3
  • 21
  • 51
1
vote
1 answer

Need to read specific columns in Wrangler Excel read

I am using Wrangler to read an excel and transform. Issue is wrangler gives option for sheet number\name, what I need is to also specify the columns to be read eg 'B1:E450'. I could not get any combination of column declaration to work. Any help is…
1
vote
0 answers

Cloud data fusion to sync tables from BigQuery to Cloud Spanner

I have a use case where I need to sync spanner table with Big Query tables. So I need to update the Spanner tables based on the updated data in Big Query tables. I am planning to using Cloud data fusion for this. But I do not see any example…
1
vote
1 answer

Exception : The specified bucket does not exist - while Storing output in GCS (Sink) using Data Fusion

I have a Data Fusion Pipeline that reads a GCS Bucket, do some transformation and then store the output (Sink) in another Cloud Storage Bucket. However, I am getting below exception when the Pipeline runs. I have granted Cloud Storage Admin role to…
1
vote
1 answer

How to send a file through http post using Data Fusion

Wanted to send a file to an http end point url using Data fusion. Making this http call as a pipeline alert at the completion of the pipeline. This is not working. Getting 500 response from API. Can someone help me on how do I send the file? If…
aruna j
  • 91
  • 5
1
vote
1 answer

How to return Object after evaluating the Templates in Airflow?

We are designing a variable selection and parameter setter logic what need to be evaluated when the DAG is triggered. Our DAGs are generated before the execution. We've decided to modify our static code into custom macros. Until this time there was…
elaspog
  • 1,635
  • 3
  • 21
  • 51
1
vote
1 answer

Could not connect Cloud SQL PostgreSQL from Data Fusion

Trying to connect a Cloud SQL PotgreSQL database from a Cloud Data Fusion (both private instances in the same VPC, not shared) as described here step by…
1
vote
2 answers

Cloud Data Fusion - Secret Manager integration

I am building a Cloud Data Fusion pipeline where I will be connecting to Database to pull the data. The requirement is to keep the Database user id and password in GCP Secret manager. How do I read these details as part of macro? If it is not…
Thelight
  • 359
  • 1
  • 5
  • 15
1
vote
3 answers

Google Cloud Spanner real time Change Data Capture to PubSub/Kafka through Cloud Data Fusion or Others

I would like to achieve a real time change data capture (log-based preferred) pipeline from Google Cloud Spanner to PubSub/Kafka for my downstream real time applications. Could you please let me know if there is a great and cost-effective way to…
1
vote
2 answers

Cloud Data Fusion - Existing Dataproc option missing

According to the documentation there is an option to use an existing Dataproc cluster in 6.2 version and above. We use Cloud Data Fusion 6.2.0 but the existing Dataproc does not appear when we try to create a new compute profile. What are we doing…
1
vote
1 answer

Input record does not contain field in data fusion

i am creating one pipeline from sql server instance in compute engine and i want to migrate this data to bigquery but in previews instance its all fine and i can to watch the rows without problem but when i run the deployment in datafusion the…
1
vote
0 answers

Data Fusion pipelines fail without execute

I have more than 50 datafusion pipelines running concurrently in an Enterprise istance of DataFusion. About 4 of them randomly fail at each concurrent run, showing in the logs only the operation of provision followed by the deprovision of the…