Questions tagged [apache-falcon]

Apache Falcon - Feed management and data processing platform

From the docs:

Falcon is a feed processing and feed management system aimed at making it easier for end consumers to onboard their feed processing and feed management on hadoop clusters.

Why?

  • Establishes relationship between various data and processing elements on a Hadoop environment
  • Feed management services such as feed retention, replications across clusters, archival etc.
  • Easy to onboard new workflows/pipelines, with support for late data handling, retry policies
  • Integration with metastore/catalog such as Hive/HCatalog
  • Provide notification to end customer based on availability of feed groups (logical group of related feeds, which are likely to be used together)
  • Enables use cases for local processing in colo and global aggregations
  • Captures Lineage information for feeds and processes
13 questions
10
votes
2 answers

APACHE NIFI vs APACHE AIRFLOW vs APACHE FALCON ? Which suits best in the below scenario?

I am developing a solution in Java which communicates with a set of devices through REST APIs which belongs to different vendors. So for each vendor, there are a set of processes that I have to perform inside my solution. However, these processes…
Selaka Nanayakkara
  • 3,296
  • 1
  • 22
  • 42
3
votes
1 answer

Falcon's role in Hadoop ecosystem

I am supposed to work on cluster mirroring where I have to set up the similar HDFS cluster (same master and slaves) as a existing one and copy the data to the new and then run the same jobs as is. I have read about falcon as a feed processing and a…
Atom
  • 768
  • 1
  • 15
  • 35
2
votes
1 answer

Can we install Apache Falcon in HDP 3?

Apache Falcon can be installed with HDP 2.x. But I could not find a way to install with HDP 3.x. Is there a way to install Falcon with HDP 3.x
Aman
  • 475
  • 2
  • 6
  • 10
1
vote
2 answers

Pipeline Dependency Graph

I am looking to create a dependency graph for a few pipelines in my cluster. I am trying to show the start and end point of my data and all the flows of data in between the two points. I am looking to use either apache airflow or apache falcon to…
1
vote
0 answers

Adding custom engine in Apache Falcon

Falcon currently supports 4 engines such as oozie,pig,hive,spark. Is it possible to add another engine ? I know i can run my own scripts in oozie. But what I want is to add my own custom engine in apache falcon. Can anyone please guide me on…
1
vote
1 answer

Apache Falcon: Setting up a data pipeline in an actual cluster [Falied to load Data, Error: 400 Bad request]

I am trying to implement the data pipeline example by HotonWorks in an actual cluster. I have the HDP 2.2 version installed in my cluster but am getting the following error in the UI for the processes and Datasets tabs Failed to load data. Error:…
Nitin Kumar
  • 765
  • 1
  • 11
  • 26
0
votes
1 answer

Unable to schedule falcon process - Could not perform authorization operation, java.io.IOException: Couldn't set up IO streams

​Hi, I am trying to schedule a falcon process using falcon CLI and falcon service user on a Kerberised cluster. I am getting the following error message: ERROR: Bad…
Sam
  • 358
  • 2
  • 3
  • 15
0
votes
1 answer

Apache Falcon feed is not getting while creating Process

I created a feed using falcon UI, but it is not getting displayed in the drop down of process creation steps.. While listing the feeds through command line also it is not getting .. But the feed is available in Falcon UI Search.. Any other steps…
Biju CD
  • 4,999
  • 11
  • 34
  • 55
0
votes
0 answers

Oozie Error Code: E1100 & ERROR, reason: Main class [org.apache.oozie.action.hadoop.ShellMain], exit code [1]

Hello I am trying to do this example Hadoop Data Pipeline ...here I am running a Flume agent where Flume copies files from local to HDFS & Falcon does the job of processing data files & after data is processed,Hive processing lineage will be…
Akki
  • 493
  • 1
  • 11
  • 23
0
votes
2 answers

Falcon cluster entity submission

I am getting error when I try to submit my cluster entity in Falcon. Output error on submission: org.apache.falcon.client.FalconCLIException: Bad Request;Cluster definition missing required namenode credential property:…
0
votes
1 answer

Should the Falcon Prism be installed on separate machine than the existing clusters?

I am trying to understand setup for a Falcon Distributed Cluster. I am having Cluster A and Cluster B, both with their Falcon Servers (and namenode, oozie, hive etc.). Now, to install the Prism, what would be the best idea? Shall I install it on one…
proutray
  • 1,943
  • 3
  • 30
  • 48
0
votes
2 answers

Apache Falcon data backup

I am not able to backup the data from one Hadoop cluster to another using Apache Falcon. What are the methods to data backup from one cluster to another? Is there any process entity or oozie workflow that is needed to do data backup from one…
rocky
  • 13
  • 4
0
votes
1 answer

Falcon vs Wandisco Non-stop

Use case is: I need to copy all my data from a HDFS cluster to another cluster with the same set up of masters and slaves and I will release the previous cluster and start running my jobs in the new cluster. I have read about Apache Falcon and…