Drools is very slow when we integrate with Talend ETL and process millions of records

Question

we have used around 30 rules with multiple conditions in it. we are under the assumption that Drools takes one record and compares it against the records then will give the output for each one.So the time taken for processing 1 million record is around 4 hours. Cant we not process the records in batches. I mean to say in big numbers and reducing time for processing. Pls help me this issue. Thanks for the response.

Can you describe your problem? Did you try monitoring Drools to see if it is indeed the problem? I suggest you should add logs to Drools to see which rules are slow. — zenbeni, Mar 10 '14 at 13:51
I have a similarly sized set of rules. In a test, I insert and retract facts, firing all rules after each insertion and then deleting the inserted fact. Each firing triggers a number of calculations and logical insertions and rules fire based on those insertions. This test can process 1M such evaluations in 14 seconds on my laptop. Therefore, it's fairly safe to say that the slowness is caused by your rules and how you interact with your knowledge sessions. So it's likely that we could help you more if you showed that, and asked how to improve it. — Steve, Mar 10 '14 at 16:46
@Steve There was this thread only a few days ago where it turned out that (IIRC) accumulate causes O(n^2) evaluations of hashCode(). I mention this only to emphasize that innocent-looking or even good-practice rules can cause significant delays, so that the fact that you can process 1M facts in 14 seconds is not very relevant. Learning how long it takes to insert-and-fire in small sets or even for single facts is essential for deciding where the bottleneck is. — laune, Mar 11 '14 at 07:31
I didn't say the rules were necessarily 'wrong'. Only that nobody can provide a remotely decent answer to this without knowing what the rules look like, as the rules themselves are usually the cause of any slowness. And *usually*, optimising the rules will make more difference than messing around with batch sizes. — Steve, Mar 11 '14 at 09:14

score 0 · Answer 1 · answered Mar 10 '14 at 14:19

Inserting 1M facts in one batch is a very bad strategy (unless you need to find combinations out of the lot). The documentation makes it clear that all work (at least in 5.x) is done during inserts and modifications. (6.x is reportedly different, but it's still bad practice to needlessly fill your memory up with objects galore.)

Simply insert, and after some suitable number, call fireAllRules() and process (transmit,...) the results. Make sure that no "dead stock" remains in Working Memory from such a batch - this would also slow you down.

Drools is very slow when we integrate with Talend ETL and process millions of records

1 Answers1