Questions tagged [cdap]

CDAP exposes developer APIs (Application Programming Interfaces) for creating applications and accessing core CDAP services. CDAP defines and implements a diverse collection of services that support applications and data on existing Hadoop infrastructure such as HBase, HDFS, YARN, MapReduce, Hive, and Spark.

References

138 questions
0
votes
1 answer

Mongodb in CDAP

I am a new user at CDAP. I follow Learn CDAP: MongoDB to CDAP Table Youtube tutorial video, but when I click run, It will return error: "Spark program 'phase-1' failed with error: Timed out after 30000 ms while waiting to connect. Client view of…
0
votes
1 answer

table-lookup directive in my wrangler using datafusion

I'm trying to use the table-lookup :country_code 'country_lookup_table' directive in a wrangler in my datafusion pipeline but I'm getting Error encountered while executing 'table-lookup' : Dataset 'country_lookup_table' could not be instantiated.…
0
votes
1 answer

How to parse integer column to String in CDAP (Datafusion)

Need your help to understand how Integer columns can be parsed to String Columns in Data Fusion using wrangler plugin. Apologies to this naïve question as I am quite new to GCP and I tried googling it but could not find any…
0
votes
0 answers

cdap - cloudfusion - parse csv and apply schema

I am trying to create a pipeline which performs following task. read and parse the csv file apply schema on top of that records which are mapping schema is written to a valid bigquery table records which doesn't match schema (i.e. if column expect…
Gaurang Shah
  • 11,764
  • 9
  • 74
  • 137
0
votes
1 answer

Batch CSV file processing using data fusion

Can data fusion process CSV files from GCS in batches? I need to process multiple folders worth of CSV files (different structures) into Big Query on my current project, and I am required to use Data Fusion. I tried simply connecting a GCS node with…
0
votes
1 answer

Cloud data fusion iterating over same pipeline

Usecase to execute- Excel with multiple tabs uploaded to cloud storage Cloud function trigger calls cloud datafusion pipeline Pipeline reads the file uses wrangler to read the individual sheets and write to separate tables as per the sheet Though…
0
votes
1 answer

Failed to lookup view "cdap" in views directories -- cdap_assets login_assets

We have cloned cdap ui repo from github. https://github.com/cdapio/cdap-ui Below is the node and npm versions node -v v10.24.1 npm -version 6.14.12 We are able to run npm install and is successful. However, when we hit the CDAP UI, few…
Sunil Gajula
  • 1,117
  • 2
  • 14
  • 30
0
votes
1 answer

Is there a way to build GCP Data Fusion pipeline using client SDK?

I am learning about GCP Data Fusion and can create a simple pipeline using GUI. Can we do the same using an SDK? I searched a lot but couldn't figure out an example OR I know we can export the pipeline as JSON but can we automate the import?
JDev
  • 1,662
  • 4
  • 25
  • 55
0
votes
1 answer

Change Service Account in CDAP Preview mode

In a Cloud Data Fusion deployment, I have a requirement to enable preview mode to be used by data engineers. But have a security requirement to not do that through Google Default Service account. Is there a was to change the service account used in…
0
votes
1 answer

Data Fusion - Argument Setter plugin throws Null pointer exception

Argument setter - HTTP post endpoint url. Below is the argument setter plugin configuration. { "name": "Argument Setter", "plugin": { "name": "ArgumentSetter", "type": "action", …
0
votes
1 answer

DataFusion -HTTP Post Callback Action

I have a pipeline alert set to make a http post call on completion of the pipeline. I need to push a file from GCS bucket to the end point url. Can someone help me how can I achieve this?
aruna j
  • 91
  • 5
0
votes
1 answer

Data Fusion not allow Struct type from Bigquery

I'm try create a pipeline on Datafusion to read a table from bigquery with STRUCT type but received this error: 2021-06-01 19:13:53,818 - WARN [service-http-executor-1321:i.c.w.s.c.AbstractWranglerHandler@210] - Error processing GET…
0
votes
1 answer

Is there any way to inject "Resources" memory values for a pipeline in Data Fusion?

I'm trying to automate some pipeline executions in Google Cloud Data Fusion (we are using 6.1.4 and 6.4.0 at this moment). At this moment we are injecting some "runtime args" into DF through a PUT API call. My question is about inyecting parameters…
xEDG
  • 1
  • 1
0
votes
2 answers

GCP Data Fusion : Custom Plugin Testing: Could not find artifact jdk.tools:jdk.tools:jar:1.6

I am trying to develop my own plugin for GCP Data Fusion. So I followed the documentation, and cloned the example from https://github.com/data-integrations/example-transform. But when building the project, I get a problem with the import of…
Nakeuh
  • 1,757
  • 3
  • 26
  • 65
0
votes
2 answers

How to perform multiple HTTP calls from within a GCP DataFusion / CDAP pipeline

I have a GCP Data Fusion pipeline where I am performing a GET request on an API which returns me a JSON list of user information including the user id. I am able to do this successfully with the Data Fusion HTTP plugin (available in the Data Fusion…
FVCC
  • 262
  • 2
  • 16