0

Let's say, we have several event flows. Some of them loads reference data to hazelcast event tables and others are actual event processors.

My objective is, while wso2cep server starts up, it will first run those event flows which are loading reference data to hazelcast event tables then it will start other flows.

It will help us to maintain reference data consistency in all the event processors flows.

Community
  • 1
  • 1
Obaid
  • 237
  • 2
  • 14

2 Answers2

0

I can see no other option other than loading within in individual execution plans. There are two options:

  1. Use trigger to load reference data periodically from RDBMS to Hazelcast. The actual process will use from Hazelcast table (This execution plan has been provided below)
  2. Load from RDBMS and cache it.

So, at this moment, my questions are:

  1. Which one is better in terms of memory utilization ?
  2. Which one is better in terms of event processing speed ?
  3. Please suggest if there are any other better way.

Execution Plan

@Plan:name('ExecutionPlan')

/* define streams/tables and write queries here ... */
/* Facts/Events streams definition */
@Import('actions:1.0.0')
define stream actions (meta_name string, correlation_id int);

@Export('userActions:1.0.0')
define stream userACtions (meta_username string, meta_actionname string);

/* Dimension tables(Event Tables) definition */
-- table from RDBMS
@from(eventtable = 'rdbms' , datasource.name = 'PG' , table.name = 'users')
@IndexBy('id')
define table DBUsers (id int, name string);

-- table from Hazelcast
@from(eventtable = 'hazelcast', collection.name='hzUsers')
@IndexBy('id')
define table hzUsers (id int, name string);

/* Load dimension tables, from RDBMS to Hazelcast, periodically using trigger */
define trigger periodicTrigger at every 30 sec;

from periodicTrigger join DBUsers
select DBUsers.id as id, DBUsers.name as name
insert into hzUsers;

/* Actual execution plan */

from actions as A 
join hzUsers as H
on A.correlation_id == H.id
select H.name as meta_username, A.meta_name as meta_actionname
insert into userACtions;
Obaid
  • 237
  • 2
  • 14
  • What did you mean by "Load from RDBMS and cache it"?. Do you mean accessing the RDBMS table directly? Or access RDBMS tables from each execution plan and periodically cache it into in-memory tables? And how many execution plans are we looking at? And what is the caching interval? – Grainier Oct 04 '16 at 16:30
  • By "Load from RDBMS and cache it": I mean "access RDBMS tables from each execution plan and periodically cache it into in-memory tables". And how many execution plans are we looking at? : It could be 30-50. And what is the caching interval?: Depending on use cases, but most of the cases it could be once a day and for few it could be as low as 10 seconds. – Obaid Oct 05 '16 at 02:17
  • If we consider the read time of each event table, it goes somewhat similar to In-Memory (same VM) < Hz < RDBMS. So you should focus on minimising the RDBMS access frequency. What you can do is use RDBMS to Hz caching with frequently cached tables (which has the small caching interval, and shared across several execution plans). For others use RDBMS to In-Memory caching. And also for huge reference tables which are shared across EPs, use RDBMS to Hz caching. – Grainier Oct 06 '16 at 09:19
  • I think, if I can connect to an external hazelcast cluster my problem will be solved. Is there any documentation on "how I can connect to an external hazelcast cluster ?" – Obaid Oct 06 '16 at 09:22
  • @Grainier, thanks for the clarifications, I also have similar understanding but I want to verify with experts. Anyway, as I have mentioned in my last comment, can I connect to external hazelcast cluster ? I could make everything simpler. – Obaid Oct 06 '16 at 09:30
  • Please refer to [hz event table documentation] (https://docs.wso2.com/display/CEP420/SiddhiQL+Guide+3.1#SiddhiQLGuide3.1-Hazelcasteventtable). You can use `cluster.addresses='ip:port,ip2:port2'` with event table annotation to connect to external cluster. – Grainier Oct 06 '16 at 09:54
0

I have checked hazelcast external cluster and seems like it is an extra overhead, need to create DataSerializable class for each type to tables.

So, I have decided as below for storing dimension/reference data for CEP:

  1. For fully opensource project, I will go as I have mentioned in another answer posted by me and please read the comments there, specialty 2nd(Obaid) & 3rd(Grainier).

  2. For commercial, projects I will go for voltdb.

Thanks all, specially @Grainier.

Obaid
  • 237
  • 2
  • 14