4

I want to trigger a job(hive or pig or oozie..etc) when a file is tranferred to specific directory(by flume) in Hadoop Distributed File System. Is it possible?

1 Answers1

2

It is possible indirectly. Oozie does not support pure data-availability triggers. You have to setup a recurrent flow with some frequency and add data-availability as an additional condition.

This is quite a common question about Oozie, unfortunately the documentation is bad.

Jakub Kotowski
  • 7,411
  • 29
  • 38
  • Some frequency means-- at a particullar interval? And also if i transfer using flume java api. is it possible to trigger it via that api? – user2645257 Feb 13 '14 at 13:01
  • Frequency: http://oozie.apache.org/docs/3.3.2/CoordinatorFunctionalSpec.html#a4._Datetime_Frequency_and_Time-Period_Representation You create a coordinator to execute a workflow with a particular frequency, e.g. once a day at 5pm. Thanks to the data-availability condition (specified as an Input Event: http://oozie.apache.org/docs/3.3.2/CoordinatorFunctionalSpec.html#a6.1.4._Input_Events ) the workflow will run only if the data is available. Oozie also provides an API which you can use to start a workflow (without a coordinator) - you can do that from Java after your Flume transfer I guess. – Jakub Kotowski Feb 13 '14 at 13:56