0

I need to create data pipelines in hadoop. I have data import, export, scripts to clean data set up and need to set it up in a pipeline now.

I have been using Oozie for data import and export schedules but now need to integrate R scripts for data cleaning process as well.

I see falcon is used for the same.

  1. How to install falcon in cloudera?
  2. What other tools are available to create data pipelines in hadoop?
simo kaur
  • 39
  • 1
  • 9
  • you can invoke R from a shell action in oozie. – abhiieor Aug 25 '16 at 18:50
  • Code if you need `export engine=$1 export hive_db=$2 export Rcode=NeighborGroupingState.R Rscript --vanilla ${Rcode} $1 $2 --hiveconf tez.credentials.path=${HADOOP_TOKEN_FILE_LOCATION} --hiveconf mapreduce.job.credentials.binary=${HADOOP_TOKEN_FILE_LOCATION}` – abhiieor Aug 25 '16 at 18:52

1 Answers1

1

2) I'm tempted to answer nifi from Hortonworks, since this post on linkedin it has grown a lot and it's very close to replace oozie. When I'm writing this answer the difference between oozie and nifi is the place where they run: nifi on external cluster and oozie into hadoop.

ozw1z5rd
  • 3,034
  • 3
  • 32
  • 49