Questions tagged [google-cloud-data-fusion]

Google Cloud Data Fusion is a fully managed, cloud-native data integration service that helps users efficiently build and manage ETL/ELT data pipelines. Data Fusion has a visual point-and-click interface, transformation blueprints, and connectors to make ETL pipeline development fast and easy. Cloud Data Fusion is based on the open-source CDAP project.

Google Cloud Data Fusion is a fully managed, cloud-native data integration service that helps users efficiently build and manage ETL/ELT data pipelines. Data Fusion has a visual point-and-click interface, transformation blueprints, and connectors to make ETL pipeline development fast and easy. Cloud Data Fusion is based on the open-source CDAP project.

This tag can be added to any questions related to using/troubleshooting Google Cloud Data Fusion.

Useful links:

445 questions
0
votes
2 answers

Google Data Fusion reading files from multiple sub folders in a bucket and need to place in another folder in side sub folder

Example sameer/student/land/compressed files sameer/student/pro/uncompressed files sameer/employee/land/compressed files sameer/employee/pro/uncompressed files In the above example I need to read files from all LAND folders present in different sub…
0
votes
1 answer

Google Cloud Data Fusion is appending a column to original data

When I am loading data encrypted data from GCS source to GCS sink there one additional column getting added. Original data Employee ID,Employee First Name,Employee Last Name,Employee Joining Date,Employee…
0
votes
1 answer

How to use BigQuery view as source in Cloud data fusion?

I am able to use BigQuery table as source and complete the task (screenshot-1) but when i put bigquery view in place of table it throws error(screenshot-2). screenshot-1 screenshot-2
0
votes
1 answer

Renaming output file using Spark tool within Google Data Fusion

I have a pipeline in Google Data Fusion which produces a CSV file with the name "part-00000-XXXXXX" (and a file called "_SUCCESS") in a target directory in a Google Cloud bucket. The rest of the file name after "part-00000" is always different and…
0
votes
1 answer

Cloud Data Fusion: Datastore Inconsistent Data Type Issues

I am working with a Google Datastore data source and within the kind (table) there's a field which holds a number which in some records show as an INTEGER type and in others as a FLOAT type. When running that source Data Fusion throws an error…
Adolfo Garza
  • 2,966
  • 12
  • 15
0
votes
1 answer

User not authorized to act as service account

I'm a newbie on GCP and going to transfer tables from azure blob storage to the cloud bucket. I follow the instructions here (use data fusion). When I finished deploying the pipeline and was going to run it, I got an error and in the advanced log is…
Cambn
  • 11
  • 3
0
votes
1 answer

Is it possible to use Cloud Data Fusion FTP -> GCS -> BQ

I am brand new to GCP and Cloud Data Fusion. I see that you can use this service to integrate data across data sources into a data lake. I have a number of sftp providers offering files in different structured formats eg. csv, json, parquet, and…
0
votes
1 answer

I have a question about the DataFusion Data Pipeline

I have a question about the DataFusion Data Pipeline. I'm using the version of the DataFusion enterprise. When I create a data pipeline in the Studio of DataFusion, you can set the CPU and memory values of the exit and driver directly in…
Quack
  • 680
  • 1
  • 8
  • 22
0
votes
2 answers

REST API calls for setting namespace preferences and Program preferences

Can the namespace preferences and program preferences be set via REST API calls? If yes, what is the syntax for it?
adiideas
  • 25
  • 4
0
votes
1 answer

I'm curious about the internal workflow of GCP's Data Fusion

I've used the Google Cloud platform's DataFusion product in developer and Enterprise mode. For developer mode, there was no dataproc setting (Master node, Worker node). For enterprise mode, there was a dataproc setting value. (Master node, Worker…
0
votes
0 answers

workflow token in CDF

Is Workflow Token feature available in CDF? The usecase I see is that some variables can be pushed to workflow level and passed to subsequent pipeline(application) which could be very handy feature. At present, If pipeline_1 > pipeline_2 >…
0
votes
1 answer

How to resolve this error in Google Data Fusion: "Stage x contains a task of very large size (2803 KB). The maximum recommended task size is 100 KB."

I need to move data from an parameterized S3 Bucket into Google Cloud Storage. Basic Data dump. I don't own the S3 bucket. It has the following syntax, s3://data-partner-bucket/mykey/folder/date=2020-10-01/hour=0 I was able to transfer data at the…
Rho
  • 1
  • 2
0
votes
1 answer

GCP Data Fusion multiple table import

I'm trying to use Multiple Database Tables and BigQuery Multi Table Data Fusion plugin to import multiple table in one pipeline But when I try to execute I get the following error java.util.concurrent.ExecutionException:…
0
votes
1 answer

I wonder if I can perform data-pipeline by directory of a specific name with DataFusion

I'm using google-cloud-platform data fusion. Assuming that the bucket's path is as follows: test_buk/... In the test_buk bucket there are four files: 20190901, 20190902 20191001, 20191002 Let's say there is a directory inside test_buk called dir. I…
Quack
  • 680
  • 1
  • 8
  • 22
0
votes
1 answer

How can I use GCS Delete in Data Fusion Studio?

Apologies if this is very simple but I am a complete beginner at GCP. I've created a pipline that picks up multiple CSVs from a bucket, wrangles them then writes them into BigQuery. I want it to then delete the contents of the bucket folder the…