Is Apache Camel's idempotent consumer pattern scalable?

Question

I'm using Apache Camel 2.13.1 to poll a database table which will have upwards of 300k rows in it. I'm looking to use the Idempotent Consumer EIP to filter rows that have already been processed.

I'm wondering though, whether the implementation is really scalable or not. My camel context is:-

<camelContext xmlns="http://camel.apache.org/schema/spring">
        <route id="main">
        <from
            uri="sql:select * from transactions?dataSource=myDataSource&amp;consumer.delay=10000&amp;consumer.useIterator=true" />
        <transacted ref="PROPAGATION_REQUIRED" />
        <enrich uri="direct:invokeIdempotentTransactions" />                
        <!-- Any processors here will be executed on all messages -->
    </route>

    <route id="idempotentTransactions">
        <from uri="direct:invokeIdempotentTransactions" />
        <idempotentConsumer
            messageIdRepositoryRef="jdbcIdempotentRepository">
            <ognl>#{request.body.ID}</ognl>
            <!-- Anything here will only be executed for non-duplicates -->
            <log message="non-duplicate" />
            <to uri="stream:out" />
        </idempotentConsumer>
    </route>            
</camelContext>

It would seem that the full 300k rows are going to be processed every 10 seconds (via consumer.delay parameter) which seems very inefficient. I would expect some sort of feedback loop as part of the pattern so that the query that feeds the filter could take advantage of the set of rows already processed.

However, the messageid column in the CAMEL_MESSAGEPROCESSED table has the pattern of

 {1908988=null}

where 1908988 is the request.body.ID I've set the EIP to key on so this doesn't make it easy to incorporate into my query.

Is there a better way of using the CAMEL_MESSAGEPROCESSED table as a feedback loop into my select statement so that the SQL server is performing most of the load?

Update:

So, I've since found out that it was my ognl code that was causing the odd message id column value. Changing it to

<el>${in.body.ID}</el>

has fixed it. So, now that I have a usable messageId column, I can now change my 'from' SQL query to

select * from transactions tr where tr.ID IN (select cmp.messageid from CAMEL_MESSAGEPROCESSED cmp where cmp.processor = 'transactionProcessor')

but I still think I'm corrupting the Idempotent Consumer EIP.

Does anyone else do this? Any reason not to?

Your update seems to fix the problem. I cant see anything wrong with the approach as long as you don't process everything every time. Sorry I don't have much experience with the idempotentConsumer yet. — Namphibian, Jun 03 '14 at 20:40

nefo_x · Answer 1 · 2014-10-06T13:10:55.743

Yes, it is. But you need to use scalable storage for holding sets of already processed messages. You can use either Hazelcast - http://camel.apache.org/hazelcast-idempotent-repository-tutorial.html or Infinispan - http://java.dzone.com/articles/clustered-idempotent-consumer - depending on which solution is already in your stack. Of course, JDBC repository would work, but only if it meets performance criteria selected.

Is Apache Camel's idempotent consumer pattern scalable?

1 Answers1