Questions tagged [cdap]

CDAP exposes developer APIs (Application Programming Interfaces) for creating applications and accessing core CDAP services. CDAP defines and implements a diverse collection of services that support applications and data on existing Hadoop infrastructure such as HBase, HDFS, YARN, MapReduce, Hive, and Spark.

References

138 questions
0
votes
1 answer

CDAP HTTP server plugin for listens incoming requests

I am using cdap-sandbox-6.8.0.zip, do we have HTTP server plugin in CDAP for listens HTTP incoming requests?
Ajay
  • 47
  • 4
0
votes
0 answers

Error 401 unauthorised when creating CDAP Namesapce via terraform access token

I am passing my code this way in the main.tf where am creating the private datafusion instance. # CDAP namespace block data "google_client_config" "current" {} provider "cdap" { host = google_data_fusion_instance.instance.api_endpoint token =…
0
votes
2 answers

Cloud Data Fusion: Numeric datatype into BigQuery

I'm running a Data Fusion pipeline with Wrangler transformations and I want to store a value as Numeric in BigQuery for some precise arithmetic calculations, but Wrangler doesn't let me transform a FLOAT value to NUMERIC. I've tried with implicit…
0
votes
2 answers

I'm Orchestrate a metadata pipeline for deployed pipeline in DataFusion, Due to organization restrictions I couldn't access DataProc clusters

I'm just orchestrate pipeline to retrieve metadata of deployed pipelines in datafusion, due to organization restriction I'm not able to access the DataProc compute details. How do we access the DataProc profile via tenant project which stores the…
0
votes
1 answer

Equivalent of UNION in Data Fusion?

I am looking to perform the equivalent of SQL UNION within one pipeline, in Data Fusion. I do not see a plugin named UNION. Can I achieve this functionality using any other existing plugin? Can I leverage a plugin designed for another purpose, to…
Ravi
  • 11
  • 2
0
votes
0 answers

CDAP Batch Sink plugin's destroy lifecycle hook is not getting called

I'm trying to develop a plugin similar to the HTTP sink plugin that comes by default in CDAP, I need to buffer incoming data and send out via an HTTP Post method whenever the buffer becomes full. I'm using Cloud Data Fusion version 6.7.2 in GCP. I…
Rahul R
  • 11
  • 2
0
votes
0 answers

How to pass spark parameter values from Airflow to CDAP Data Fusion as run time argument

I am running a GCP Cloud Data Fusion pipeline from Composer (Airflow) DAG. I want to pass few spark parameters during run time from Airflow DAG to CDF pipeline, so that they reflect in Dataproc cluster during spark job execution. Parameters like…
0
votes
0 answers

Cdap connectivity with Apache HIVE

I have linux Box with CDAP installed and I configured the Hive import and Export plugins in CDAP. In the same machine, I have Hadoop with HIVE installed. Am able to start all of the Hadoop services and verified using jps command and create and query…
jay
  • 1
0
votes
0 answers

Unable to load BigQuery table from Data Fusion pipeline with parameter

I am trying to load a csv file from GCS to BigQuery table using simple Data Fusion pipeline. The pipeline is parameterized using GCS Argument Setter and JSON file. I am getting error wile loading data to target BQ table. Please refer the screenshots…
0
votes
1 answer

Move only files that were read Google Cloud Data Fusion pipeline

Within a pipeline with executions in a limited time (30 minutes) that has as its source a GCS bucket and as a target BigQuery, after processing each file I want to move only the files that were executed in the pipeline, however in conditions and…
DevJonDoe
  • 1
  • 1
0
votes
0 answers

How set name file for GCS Move using Arguments runtime with wildcards in CDAP/Data fusion pipeline

I need to move some files between buckets after they are processed in the pipeline, however I have come across files that contain characters like "+" or "-" within their name (example: data+1+2132121.json) following the documentation in this How to…
DevJonDoe
  • 1
  • 1
0
votes
1 answer

Mapping json result in Data Fusion on GCP

I'm trying to withdraw information from Facebook Graph API and convert the result into a readable form in Google Data Fusion using HTTP plugin to then upload results into Google BigQuery. I've used this method in the past but in this particular…
0
votes
1 answer

Data Fusion Endpoint API call from Python

If I want to execute below shell commands from Python (through API call) , how can I do that ? AUTH_TOKEN=$(gcloud auth print-access-token) CDAP_ENDPOINT=$(gcloud beta data-fusion instances describe \ --location=${CDF_REGION} \ …
0
votes
1 answer

Adding new column in target in GCP Data Fusion

How to add a new column in target with static values in GCP Data Fusion (with/without wrangler)?
nomadSK25
  • 2,350
  • 3
  • 25
  • 36
0
votes
1 answer

Sorting in Data fusion

I have very simple queries around Data Fusion and the ETL transformation capabilities: How do you sort a file in data fusion using a particular column/columns? Couldn't find any plugin or any directive in wrangler. How do you perform a cumulative…
Piyush Lohana
  • 43
  • 2
  • 7