Questions tagged [google-cloud-data-fusion]

Google Cloud Data Fusion is a fully managed, cloud-native data integration service that helps users efficiently build and manage ETL/ELT data pipelines. Data Fusion has a visual point-and-click interface, transformation blueprints, and connectors to make ETL pipeline development fast and easy. Cloud Data Fusion is based on the open-source CDAP project.

Google Cloud Data Fusion is a fully managed, cloud-native data integration service that helps users efficiently build and manage ETL/ELT data pipelines. Data Fusion has a visual point-and-click interface, transformation blueprints, and connectors to make ETL pipeline development fast and easy. Cloud Data Fusion is based on the open-source CDAP project.

This tag can be added to any questions related to using/troubleshooting Google Cloud Data Fusion.

Useful links:

445 questions
0
votes
1 answer

How can I use a DataFusion to perform ETL operations when I have multiple data files (.txt) to convert? Using functions within the Datafusion UI

The tasks are outlined below. Multiple data files within Google Cloud Storage(GCS) are stored in partitions(/directory01/directory02/.../.text) I am going to use datafusion to carry out ETL work and load it on the bigQuery table. ETL operation was…
Quack
  • 680
  • 1
  • 8
  • 22
0
votes
1 answer

How to resolve 'unsupported type NULL' error when running a datafusion pipeline

I am working on extracting schema from .txt data into bigQuery by google cloud platform datafusion. First the datafusion was created in developer mode. Second, I pointed to Google Cloud Storage, where data was stored. and I converted it to JSON…
Quack
  • 680
  • 1
  • 8
  • 22
0
votes
2 answers

google data fusion xml parsing - 'parse-xml-to-json' : Mismatched close tag note at 6

I am new to Google Cloud Data Fusion. I was able to successfully process CSV file and load into BigQuery. My requirement is process XML file and load into BigQuery. To try, i just took very simple XML XML File: {
0
votes
2 answers

Data Fusion: Note enough memory issue and Lost Executor Issue

I am processing a File via Google Data Fusion Pipeline but as pipeline goes I am getting below Warnings and Errors: 09/25/2020 12:31:31 WARN org.apache.spark.storage.memory.MemoryStore#66-Executor task launch worker for task 6 Not enough space…
0
votes
1 answer

Apply Rank or partitioned row_num function in Data Fusion

I want to implement rank or partitioned row_num function on my data in Data Fusion but I don't find any plugin to do so. Is there any way to have this ? I want to implement the below, Suppose I have this above data, now I want to group the data…
0
votes
1 answer

Error when trying to load data from Data Fusion to Salesforce

I`m getting this error when trying to load data from Data Fusion to Salesforce: java.lang.RuntimeException: There was issue communicating with Salesforce at…
0
votes
1 answer

File triggered CDF job

In CDAP, partition trigger type is available as below. schedule(buildSchedule("runOnlyAtNight", ProgramType.WORKFLOW, "cleanupWorkflow") .withTimeWindow("22:00", "06:00”).waitUntilMet() .triggerOnPartitions("myDataset", 1)); Is it available in Cloud…
adiideas
  • 25
  • 4
0
votes
1 answer

Data Fusion: GCS create creating folders not object

I am trying to create an GCS object (File) with GCS create plugin of Data Fusion. but it is creating a folder instead. How I can have a file created instead of a folder ??
0
votes
1 answer

Inbuild Pipeline arguments

I wanted to know how to get list of in-built pipeline arguments in Data Fusion Pipeline ? I am not able to find them anywhere on the documentation as well as on the internet.
0
votes
1 answer

Google CDF: Can we set the value of a column as a runtime argument?

I am getting a value returned by hitting a HTTP endpoint which I am storing in a column. Now a want to trigger another Http Endpoint with the value in the column. But the HTTP endpoint takes hardcoded values or macros only. So I want to know if I…
0
votes
1 answer

Cloud Data Fusion: Oracle data type issues

In Cloud Data Fusion, i am using oracle plugin provided to get data from Oracle 18c database. When using this for source, I able to successfully extract varchar data, but the number/integer types are not successful. Oracle table values are: |ID…
Vaibhav
  • 123
  • 1
  • 6
0
votes
0 answers

Google Cloud Data Fusion produces inconsistent output data

I am creating a DataFusion pipeline to ingest a CSV file from s3 bucket, applying wrangler directives and storing it in GCS bucket. The input CSV file had 18 columns. However, the output CSV file has only 8 columns. I have a doubt that this could be…
0
votes
1 answer

How to calculate the number of rows in CDAP/DATA Fusion?

How to calculate the number of rows, for example, I use the NullFieldSplitter plugin to divide the data into two parts, and I want to calculate the number of rows for each part. How to calculate it? Someone can take a look and help me, thanks.
Gray
  • 99
  • 11
0
votes
1 answer

Custom transform not getting applied in wrangler in Google Cloud Data Fusion

I am trying to following custom transform in a wrangler in Google Cloud Data Fusion. set-column column (parse-as-json :column 2 ) ? column =^ "[" : (parse-as-json :column 1 ) I want to parse column as JSON to a depth of 2 if it is an array, which…
0
votes
2 answers

CDAP DataFusion GET Pipeline Runs Invalid IAP Credentials Error

I am trying to do a GET API call to get specific pipeline run history. The API URL is as follows APIEndpoint/api/v3/namespaces/default/apps/DataPipeline_name/workflows/DataPipelineWorkflow/runs?limit=1 This API call needs a access token which I get…