0

I am working on a CEP project where I analyze logs from a file in bulk. The file is a compressed csv file that is bulk transferred over to my analytics machine every hour, where each line contains an event with a timestamp for exactly when it happened during that previous hour.

Reading this file into a plain Java object is no problem and I will typically end up with something like this:

class MyEvent {
    public Date getTimestamp();
    public String getMessage();  //shortened to these field only for simplicity
    public String getSource();
    public int getCount();
}

So the problem is that this file may contain events that were written anywhere between 1 hour ago and 1 second ago, and the only way to know is to inspect the timestamp field in the event itself. When loading these events into Esper, then will all be loaded within a few seconds (there will probably be tens of thousands, and will be loaded as fast as Esper can accept them).

Now, the analysis itself want to calculate average "count" per "source" every 5 minutes in Esper (nothing too complex), however, as all events are loaded within a few seconds, the time window in Esper will be wrong and all events may be within the same time window regardless of when they were produced. So my question is: Is there anyway to override what is counted as the event timestamp in Esper time windows?

The problem also increases when the time window is split between two files that are loaded with an hour delay.

Thank you.

agnsaft
  • 1,791
  • 7
  • 30
  • 49

1 Answers1

1

This will do it: select source, sum(count) from MyEvent group by source output all every 5 seconds

Esper also allows external timer to control time freely in app code.

user650839
  • 2,594
  • 1
  • 13
  • 9
  • The example you supplied was the same one i considered until i realized the offset of the event. Can you elaborate on the external timer comment? – agnsaft Jun 14 '14 at 23:52
  • "until you realized the offset of the event" - what would that mean? – user650839 Jun 16 '14 at 12:23
  • http://esper.codehaus.org/esper-5.0.0/doc/reference/en-US/html_single/index.html#api-controlling-time – user650839 Jun 16 '14 at 12:24
  • if expiring events based on time is the question, perhaps dump the events into a named window an use on-delete to delete with a where clause that considers time. – user650839 Jun 16 '14 at 12:25
  • I am sorry, by offset I meant the delay between the event is generated and the time its fed into Esper (e.g. in bulk). That means that events that are generated with a 10 minute delay may be fed into Esper with only a single second delay between them, or a whole hour between them depending on the hourely bulk import. – agnsaft Jun 17 '14 at 13:08