Questions tagged [cdap]

CDAP exposes developer APIs (Application Programming Interfaces) for creating applications and accessing core CDAP services. CDAP defines and implements a diverse collection of services that support applications and data on existing Hadoop infrastructure such as HBase, HDFS, YARN, MapReduce, Hive, and Spark.

References

138 questions
0
votes
1 answer

GCP Data Fusion multiple table import

I'm trying to use Multiple Database Tables and BigQuery Multi Table Data Fusion plugin to import multiple table in one pipeline But when I try to execute I get the following error java.util.concurrent.ExecutionException:…
0
votes
2 answers

cdap sandbox won't start - Unable to read transaction state version

I have installed the binaries for the CDAP sandbox using the recipe found here. I was building a plugin and may have had a debugger blocking work. I rebooted my Linux PC on which the sandbox was running and now when I try and start the CDAP…
Kolban
  • 13,794
  • 3
  • 38
  • 60
0
votes
2 answers

Data Fusion: Note enough memory issue and Lost Executor Issue

I am processing a File via Google Data Fusion Pipeline but as pipeline goes I am getting below Warnings and Errors: 09/25/2020 12:31:31 WARN org.apache.spark.storage.memory.MemoryStore#66-Executor task launch worker for task 6 Not enough space…
0
votes
1 answer

Apply Rank or partitioned row_num function in Data Fusion

I want to implement rank or partitioned row_num function on my data in Data Fusion but I don't find any plugin to do so. Is there any way to have this ? I want to implement the below, Suppose I have this above data, now I want to group the data…
0
votes
1 answer

Data Fusion: GCS create creating folders not object

I am trying to create an GCS object (File) with GCS create plugin of Data Fusion. but it is creating a folder instead. How I can have a file created instead of a folder ??
0
votes
1 answer

Inbuild Pipeline arguments

I wanted to know how to get list of in-built pipeline arguments in Data Fusion Pipeline ? I am not able to find them anywhere on the documentation as well as on the internet.
0
votes
1 answer

How to calculate the number of rows in CDAP/DATA Fusion?

How to calculate the number of rows, for example, I use the NullFieldSplitter plugin to divide the data into two parts, and I want to calculate the number of rows for each part. How to calculate it? Someone can take a look and help me, thanks.
Gray
  • 99
  • 11
0
votes
1 answer

Custom transform not getting applied in wrangler in Google Cloud Data Fusion

I am trying to following custom transform in a wrangler in Google Cloud Data Fusion. set-column column (parse-as-json :column 2 ) ? column =^ "[" : (parse-as-json :column 1 ) I want to parse column as JSON to a depth of 2 if it is an array, which…
0
votes
2 answers

CDAP DataFusion GET Pipeline Runs Invalid IAP Credentials Error

I am trying to do a GET API call to get specific pipeline run history. The API URL is as follows APIEndpoint/api/v3/namespaces/default/apps/DataPipeline_name/workflows/DataPipelineWorkflow/runs?limit=1 This API call needs a access token which I get…
0
votes
1 answer

How to use CDAP http plugin to get google cloud api with OAUTH 2.0?

I want to get BigQuery schema form Google Cloud API, I config all the things in GCP,and I can get info from the postman, but how to config in CDAP, here is my config: And my CDAP version is 6.2.0
Gray
  • 99
  • 11
0
votes
1 answer

Do we have expiration for google cloud data fusion drafts

Anyone is aware whether CDAP drafts saved has any expiry , I have created few drafts in one of the cloud data fusion instance and the same instance is being used by other folks . But after like 2-3 days when I tried to retrieve the draft , found it…
Psen
  • 5
  • 3
0
votes
1 answer

CDAP API Calling in GCP failing

I am trying to create a sample Pipeline in my Data Fusion instance, as part of my Project POC. I am using CDAP API for automate the pipeline creation. I am facing issue while calling below CDAP API in GCP, curl -H "Authorization: Bearer $(gcloud…
0
votes
2 answers

unable to delete custom plugin from datafusion instance

I tried uploading a custom jar as cdap plugin and it has few errors in it. I want to delete that particular plugin and upload a new one. what is the process for it ? I tried looking for documentation and it was not much informative. Thanks in…
0
votes
1 answer

Converting data types of Foreign Keys to use Joiner in Google Cloud Data Fusion Pipeline

I am building a pipeline that connects to an on-prem Oracle DB using the Database Plugin, queries two tables (table_a, table_b), and joins those tables using Joiner Plugin, before uploading to a BigQuery table. The problem I have now is that the…
Korean_Of_the_Mountain
  • 1,428
  • 3
  • 16
  • 40
0
votes
1 answer

Why datafusion HTTP plugin URL macro is not working?

I am exploring macros in datafusion pipelines. I am using HTTP Sink plugin and trying to enable macro option for URL option like {URL}. when i try to deploy the pipeline, it is throwing the following error. Failed to configure pipeline: Stage…