0

I found data loss in Esper (v.7.1.0) in case if inbound thread pool is enabled. Here is simple example that demonstrates this strange behaviour:

    Configuration config = new Configuration();
    // set up concurrent processing
    config.getEngineDefaults().getThreading().setThreadPoolInbound(true);

    EPServiceProvider epService = EPServiceProviderManager.getDefaultProvider(config);

    // simple schema
    epService.getEPAdministrator().createEPL("create objectarray schema LogLine as (account_name string, value int) ");
    // event for terminating context partition
    epService.getEPAdministrator().createEPL("create schema TerminateEvent() ");

    // Allocates context partition for each account_name. Start it on LogLine event and terminate on TerminateEvent.
    epService.getEPAdministrator()
            .createEPL("create context NestedCtx " + 
                       "context InitCtx start LogLine end TerminateEvent ," + 
                       "context AccountCtx partition by account_name from LogLine");
    // select to collect count of events per account_name.
    EPStatement statement = epService.getEPAdministrator().createEPL(" context NestedCtx select account_name, count(*) as total from LogLine output last when terminated ");
    // attach listener for printing results 
    statement.addListener(new UpdateListener() {

        @Override
        public void update(EventBean[] newEvents, EventBean[] oldEvents) {
            for (EventBean eventBean : newEvents) {
                String properties = Arrays.stream(eventBean.getEventType().getPropertyNames()).map((prop) -> {
                    return prop + " " + eventBean.get(prop);
                }).collect(Collectors.joining("; "));
                System.out.println(properties);
            }

        }
    });
    //send 3 LogLine events
    epService.getEPRuntime().sendEvent(new Object[] { "TEST", 10 }, "LogLine");
    epService.getEPRuntime().sendEvent(new Object[] { "TEST", 10 }, "LogLine");
    epService.getEPRuntime().sendEvent(new Object[] { "TEST", 10 }, "LogLine");

    // send terminate event in order to get results
    epService.getEPRuntime().sendEvent(Collections.emptyMap(), "TerminateEvent");
    System.out.println("finish");

The problem is that UpdateListener is not being called when concurrent processing is enabled. Result is printed only when I disable inbound thread pool. What's the reason of this behaviour?

Taras
  • 46
  • 5

1 Answers1

1

Inbound threading can change the order in which events get processed, as the JVM can process queued tasks in any order. Therefore when your use case requires ordered processing of events, that means inbound threading is not the right choice. You application code can instead allocate your its queue/threads and associate events to the threads making sure that order is preserved. For example as discussed in this StackOverflow question.

user650839
  • 2,594
  • 1
  • 13
  • 9
  • Thanks for answer. But I don't care about order in my example. I send 3 equal events _{ account_name: "TEST", value: 10 }_ and then in order to get results TerminateEvent event is submitted. At the end I need to get _SELECT account_name, COUNT(*) ... GROUP BY account_name_ which is _{account_name:"TEST", count:3}_. Order is not important, I just want Esper to process all events without data loss. – Taras Apr 06 '18 at 07:00
  • 1
    The "context InitCtx start LogLine end TerminateEvent" means that analysis starts when LogLine comes in. The JVM however can just pause the thread providing that LogLine event and the JVM can proceed to process all other events and may even process "LogLine" as the last event in the worst case. With inbound threading the order in which events get processed is any order and you must assume the worst order. – user650839 Apr 06 '18 at 14:38
  • Thanks again! Could you please provide real world use case for inbound threading? – Taras Apr 06 '18 at 19:14
  • select count(*) from MyEvent (and all its variants) – user650839 Apr 07 '18 at 22:56
  • I've disabled inbound thread pool and started to concurrently send events. Unfortunately results are not stable. Here is corresponding question: https://stackoverflow.com/questions/49738825/esper-loss-less-events-processing – Taras Apr 09 '18 at 17:55