2

How can I create batch processing application with Apache Apex?

All the examples I've found were streaming applications, which means they are not ending and I would like my app to close once it has processed all the data.

Thanks

brusli
  • 79
  • 9
Krever
  • 1,371
  • 1
  • 13
  • 32

2 Answers2

4

What is your use-case? Supporting batch natively is on the roadmap and is being worked on right now.

Alternately, till then, once you are sure that your processing is done, the input operator can send a signal as ShutdownException() and that will propogate through the DAG and shutdown the DAG.

Let us know if you need further details.

  • I'm writing comparison of almost all open source bigdata processing engines as my msc thesis. I wanted to create a apex-batch part of it(in line with Mapreduce, Flink and Spark). I will probably skip it for now and go on with stream comparisons. – Krever Nov 28 '16 at 16:14
  • Sure. The way to use it is: in your endWindow() call, check if your task is complete - some custom logic would be required. If your task is comeplete, invoke ShuddownException() and your entire pipeline would shutdown. – Sandeep Deshmukh Dec 01 '16 at 15:08
3

You can add an exit condition before running the app. for example

public void testMapOperator() throws Exception
{
   LocalMode lma = LocalMode.newInstance();
   DAG dag = lma.getDAG();

   NumberGenerator numGen = dag.addOperator("numGen", new NumberGenerator());
   FunctionOperator.MapFunctionOperator<Integer, Integer> mapper
    = dag.addOperator("mapper", new  FunctionOperator.MapFunctionOperator<Integer, Integer>(new Square()));
   ResultCollector collector = dag.addOperator("collector", new ResultCollector());

   dag.addStream("raw numbers", numGen.output, mapper.input);
   dag.addStream("mapped results", mapper.output, collector.input);

// Create local cluster
   LocalMode.Controller lc = lma.getController();
   lc.setHeartbeatMonitoringEnabled(false);

 //Condition to exit the application
  ((StramLocalCluster)lc).setExitCondition(new Callable<Boolean>()
  {
     @Override
     public Boolean call() throws Exception
    {
       return TupleCount == NumTuples;
    }
  });

  lc.run();

  Assert.assertEquals(sum, 285);
}

for the complete code refer https://github.com/apache/apex-malhar/blob/master/stream/src/test/java/org/apache/apex/malhar/stream/FunctionOperator/FunctionOperatorTest.java

Scorpio
  • 511
  • 4
  • 14
  • How about some more general solution in terms of runtime environment? I would like to have the possibility to choose if it's local or cluster env. – Krever Nov 28 '16 at 13:57