I want to trigger a job(hive or pig or oozie..etc) when a file is tranferred to specific directory(by flume) in Hadoop Distributed File System. Is it possible?
Asked
Active
Viewed 1,391 times
1 Answers
2
It is possible indirectly. Oozie does not support pure data-availability triggers. You have to setup a recurrent flow with some frequency and add data-availability as an additional condition.
This is quite a common question about Oozie, unfortunately the documentation is bad.

Jakub Kotowski
- 7,411
- 29
- 38
-
Some frequency means-- at a particullar interval? And also if i transfer using flume java api. is it possible to trigger it via that api? – user2645257 Feb 13 '14 at 13:01
-
Frequency: http://oozie.apache.org/docs/3.3.2/CoordinatorFunctionalSpec.html#a4._Datetime_Frequency_and_Time-Period_Representation You create a coordinator to execute a workflow with a particular frequency, e.g. once a day at 5pm. Thanks to the data-availability condition (specified as an Input Event: http://oozie.apache.org/docs/3.3.2/CoordinatorFunctionalSpec.html#a6.1.4._Input_Events ) the workflow will run only if the data is available. Oozie also provides an API which you can use to start a workflow (without a coordinator) - you can do that from Java after your Flume transfer I guess. – Jakub Kotowski Feb 13 '14 at 13:56