3

I have two systems, call them A and B. When some significant object changes in A, A sends it to B through Apache Camel. However, I encountered one case, when A actually has change log of an object, while B must reflect only actual state of the object. Moreover, change log in A can contain "future" records. It means, that object's state change is scheduled on some moment in the future. A user of the system A can edit this change log, remove change records, add new change records with any timestamp (in the past and in the future) and even update existing changes. Of course, A sends these change records to B, but B needs only actual state of the object.

Note, that I can query objects from A, but A is a performance-critical system, and I therefore I am not going to query something, as it can cause additional load. Also, API for querying data from A is overcomplicated and I would like to avoid it whenever possible.

I can see two problems here. First is realizing whether the particular change in change log record may cause changing of the actual state. I am going to store change log in an intermediate database. As a change log record comes, I am going to add/remove/update it in the intermediate database and then calculate actual state of the object and send this state to B.

Second is tracking change schedule. I couldn't invent anything except for running a periodical job in a constant interval (say, 15 minutes). This job would scan all records that fall in the time interval from the last invocation until the current invocation.

What I like Apache Camel for is its component-based approach, when you only need to connect endpoints and get everyting work, with only a little amount of coding. Is there any pre-existing primitives for this problem both in Apache Camel and in EIP?

Alexey Andreev
  • 1,980
  • 2
  • 17
  • 29
  • You say API for querying data from A is overcomplicated. How do you plan to get data from A? Does A have any push mechanism? Are you going to access directly A's repository? Are you going to modify A? If so, why don’t you add better API for querying with some cache mechanism to protect the performance? – Sergey Oct 10 '14 at 07:00
  • When A detects that some entity got changed, it serializes it into XML and writes it into a file in the filesystem. I use the following consumer: ``. Also, A has a mechanism for pushing all data, and I am going to use it for the initial loading of data. I don't own A's code and I can't neither affect its queriyng API, nor intoroduce cache. Also, it is a matter of a policy: it is much easier to convince owners of A to allow access of integration bus if I could prove that this bus won't affect A's performance. – Alexey Andreev Oct 10 '14 at 07:10
  • 1
    Your case is not trivial and I don’t believe your can find any silver bullet solution. The closest component AFAIK is http://camel.apache.org/cache.html. If it does not fit, you can always made you custom DB based solution, using all extras provided by Camel like transactions, monitoring, etc – Sergey Oct 10 '14 at 08:13

1 Answers1

0

I am actually working on a very similar use-case, where system A sends Snapshot and updates which require translation before sending to system B.

First, you need to trigger the mechanism for giving you the initial state (the "snapshot") from system A, the timer: component can start the one-time startup logic.

Now, you will receive the snapshot data (you didn't specify how, perhaps it is an ftp file or a jms endpoint). Validate the data, split it into items, and store each item of data in a local in-memory cache:, as Sergey suggests in his comment, keyed uniquely. Use an expire policy that is logical (e.g. 48 hours).

From there, continually process the "update" data from the ftp: endpoint. For each update, you will need to match the update with the data in the cache: and determine what (and when) needs to be sent to system B.

Data that needs to be sent to system B later will need be to be persisted in memory or in a database.

Finally, you need a scheduling mechanism to determine every 15 minutes if new data should be sent, you can easily use timer: or quartz: for this.

In summary, you can build this integration from following components: timer, cache, ftp, quartz plus some custom beans / processors to perform the custom logic.

The main challenges are handling data that is cached and then updated and working out the control mechanisms for what should happen on the initial connect, a disconnect, or if your camel application is restarted.

Good luck ;)

vikingsteve
  • 38,481
  • 23
  • 112
  • 156