2

All examples of Apex say that the first operator of the DAG should be an input operator. Can this operator appear in the middle of the DAG somewhere.

Consider a case in which I have data to be fetched from the database, based on some data that has just been processed by a previous operator, this would mean that an input operator will come in the middle of the DAG somewhere.

According to the definition of an input operator it is one which does not have any input stream. But it also does the work of fetching data if a connector is used. So will it work if I fetch data somewhere in-between a DAG ?

Community
  • 1
  • 1
frewper
  • 1,385
  • 6
  • 18
  • 44

3 Answers3

3

This is an interesting use-case. You should be able to extend an input operator (say JdbcInputOperator since you want to read from a database) and add an input port to it. This input port receives data (tuples) from another operator from your DAG and updates the "where" clause of the JdbcInputOperator so it reads the data based on that. Hope that is what you were looking for.

Sanjay
  • 141
  • 1
  • Hi Sanjay, is it actually possible considering that the processing for an InputOperator and a generic operator is different. https://apex.apache.org/docs/apex/operator_development/#how-operator-works – Ajay Gupta Mar 01 '17 at 04:03
  • I just saw Vlad's answer. My query in previous comment is clarified. – Ajay Gupta Mar 01 '17 at 04:06
3

Yes, it is possible. You may extend an existing InputOperator and add InputPort(s) to it. In this case, Apex platform will handle your operator as a generic operator and not call InputOperator.emitTuples(). It will be your extended operator responsibility to call super.emitTuples() or directly emit on the output port(s).

Vlad Rozov
  • 131
  • 3
1

No, an input operator cannot be used in between the DAG. As you have already pointed out, since there is no input stream, you will not be able to get data from previous operator for use with this operator.

For the example you pointed out, it would be better to write your own generic operator with an input stream which actually has similar functionality to the input operator where in it can read data from external source based on the data in the input stream.

Also, just a point to note : If the query is too heavy, its better to have an asynchronous thread to query the database. This thread can write data to a queue from which the main thread can read the records and emit them on the output stream. This will ensure that the main operator thread is not blocked and the operator wont fail.

Ajay Gupta
  • 145
  • 11