Vertx | Global state of Verticles in a cluster

Question

Newbie alert.

I'm trying to write a simple module in Vertx that polls the database (PostGres) every 10 seconds and pushes the results to the clients. I'm thinking of confining the blocking code (queries the database via JDBC) in a worker verticle and rest of the above layers are completely non-blocking and async.

This module will be packaged as a jar and distributed to a different apps (typically webapps) which can subscribe to the event bus via the javascript bridge.

My question here is in a clustered environment where I have 5 processes of the webapp running with the vertx modules, how can I ensure that there's only one vertx verticle querying the database. I don't want all the verticles querying the database and add more load. Or is there a different way to think to solve this problem. I'm using Vertx version 3.4.1

If you are using the same event bus name, you publish a message on it. Only one worker will consume it and start working on it. — Niraj Chauhan, Apr 14 '17 at 05:30
Sure, I understand that. My use-case here is I'll need to poll the database every 10 seconds (meaning fire a message every 10 seconds to the event bus which triggers the handler to execute the query). But I don't want all the verticles firing the message in different processes thereby triggering the jdbc calls in multiple processes. — user1189332, Apr 14 '17 at 06:07
First of all, why use JDBC when Vert.x offers a non-blocking PostGreSQL implementation? http://vertx.io/docs/vertx-mysql-postgresql-client/java/ Second, if the only thing you want to do is query the database every x secons, why not create a separate Verticle for it? You could call `ResultSet::toJson` to convert the resulting query to Json and publish it on a reserved address of the `EventBus`. — 3limin4t0r, Apr 20 '17 at 18:32

score 0 · Answer 1 · answered Apr 15 '17 at 05:52

So there are 2 ways how your verticle can be multiplied:

If you instantiate multiple instances when you deploy your verticle
If you start to cluster your vert.x instances in different jvm's or different hosts

You could try to control the number of instances of your verticle which executes the query. Means you ensure, that the verticle only exists in one of your vert.x instances and your verticle is deployed with only one instance.

But this has several drawbacks:

your deployment is not transparent, means your cluster nodes differ in the deployment structure.
if your cluster node dies, where the query verticle is running, then you have no fallback.

So the best thing is, to deploy the verticle on all instances and synchronize it.

I see 3 possibilites:

Use hazelcast (the clustermanager of vert.x) to synchronize http://vertx.io/docs/apidocs/io/vertx/spi/cluster/hazelcast/HazelcastClusterManager.html#getLockWithTimeout-java.lang.String-long-io.vertx.core.Handler-
There are also datastructures available, which are synchronized over the cluster http://vertx.io/docs/apidocs/io/vertx/spi/cluster/hazelcast/HazelcastClusterManager.html#getSyncMap-java.lang.String-
Use your database as synchronization point. you could add a simple table which stores the last execution time in millis. The polling modules, will first check if it is time to execute the next poll. If the polling module executes the poll it also updates the time. This has to be done in one transaction with a explicit lock on the time table.
You use redis with the https://redis.io/commands/getset functionality. You can store the time in millis in a key and ensure with the getset method, that the upgrade of the time is atomic. So only the polling module which could set the key in redis, will execute the poll.

Here is an example of the redis getset for your usecase https://github.com/swisspush/gateleen/blob/master/gateleen-scheduler/src/main/java/org/swisspush/gateleen/scheduler/Scheduler.java#L94 — haschibaschi, Apr 15 '17 at 05:56

score 0 · Answer 2 · answered Apr 22 '17 at 14:39

I'm giving out my naive solution here, I don't know if it would completely solve your problem or not but here is my thought process.

1) Polling bit, yes indeed you can have a worker verticle for blocking call's [ or else you could use Async bit here too IMHO because you already have Async Postgress JDBC client ] for the every 10secs part. code snippet like this can help you

vertx.setPeriodic(10000, id -> {
  // This handler will get called every 10 seconds
  JsonObject jdbcObject = fetchFromJdbc();
  eventBus.publish("INTRESTED_PARTIES", jdbcObject); 
});

2) For the listening part all the other verticles can subscribe to event bus and listen for the that address and would be getting the message whenever things would happen

3) This is for ensuring part that not all running instances of your jar start polling the database, for this I think the best possible way to handle would be not deploying the verticle in any jar and running the verticle in an standalone way using runtime vertx command like

vertx run DatabasePoller.java -cluster

And if you really want to be very fancy you could throw in Service Discovery for ensuring part that if the service of the verticle is already register then no other deployments would trigger registrations.

But I want to give you thumbs up on considering the events for getting that information much better way for handling inter-system communication.

Vertx | Global state of Verticles in a cluster

2 Answers2