Trigger a job when a file is moved in hdfs(hadoop) directory

Question

I want to trigger a job(hive or pig or oozie..etc) when a file is tranferred to specific directory(by flume) in Hadoop Distributed File System. Is it possible?

score 2 · Answer 1 · answered Feb 12 '14 at 12:14

2

It is possible indirectly. Oozie does not support pure data-availability triggers. You have to setup a recurrent flow with some frequency and add data-availability as an additional condition.

This is quite a common question about Oozie, unfortunately the documentation is bad.

answered Feb 12 '14 at 12:14

Jakub Kotowski

7,411
29
38

Some frequency means-- at a particullar interval? And also if i transfer using flume java api. is it possible to trigger it via that api? – user2645257 Feb 13 '14 at 13:01
Frequency: http://oozie.apache.org/docs/3.3.2/CoordinatorFunctionalSpec.html#a4._Datetime_Frequency_and_Time-Period_Representation You create a coordinator to execute a workflow with a particular frequency, e.g. once a day at 5pm. Thanks to the data-availability condition (specified as an Input Event: http://oozie.apache.org/docs/3.3.2/CoordinatorFunctionalSpec.html#a6.1.4._Input_Events ) the workflow will run only if the data is available. Oozie also provides an API which you can use to start a workflow (without a coordinator) - you can do that from Java after your Flume transfer I guess. – Jakub Kotowski Feb 13 '14 at 13:56

Trigger a job when a file is moved in hdfs(hadoop) directory

1 Answers1