Where can I get large number of rules and facts or how can I generate them for Drools benchmark?

Question

I would like to test Drools performance, such as memory consupmtion and inferencing speed for large amount of data. I did it through running benchmarks that are available on drools projects https://github.com/droolsjbpm/drools just as other example there. There are commonly used benchmarks such as manners, waltz and waltzdb. But on my computer they takes dozen of seconds. Could U suggest me any sources of rules and objects/facts that can I use and test for free with Drools? Maybe it is possible to generate such data and rules? Then how could I do that?

Thanks for help.

I want to have tests that show borderline cases. And investigate which objects cause memory and speed problems. Analysing scientific papers about Rete I guess where it could be, but I'd like to know in this specific implementation. — gadon, Nov 08 '13 at 17:21
When objects are stored in memory, occupy e.g. a few hundreds of MB or even more if it's possible. Based on some professional's opinion I know that then inferencing engines tend to have some troubles with large amount of data. And I'm curious about such cases. Thanks. — gadon, Nov 08 '13 at 20:03

score 4 · Accepted Answer · edited Feb 11 '19 at 18:29

4

It's worth noting that those benchmarks have no purpose whatsoever. They are mostly specifically designed to do things which are inefficient in rules engines. They even have very little value for comparison between engines, given that you're unlikely to ever write a real-world application that is anything like Miss Manners.

If you just want large amounts of data for your tests, there is loads of open data out there. For instance, the UK provides a variety of open data sets. You can pick one which suits your experiment here.

http://data.gov.uk/data/search

Or you could grab a load of gene sequence data from GenBank:

http://www.ncbi.nlm.nih.gov/genbank/

There's loads of free data out there, for which you could write rules.

If you are really looking to benchmark rules engines, then it would probably be better to generate the data yourself. That's the best way to ensure that you get reliable statistical variations.

However, all you will be doing is benchmarking a specific set of rules. Any such benchmarks would be redundant as soon as the rules change.

edited Feb 11 '19 at 18:29

Lauro Gripa Neto

77
1
7

answered Nov 09 '13 at 14:32

Steve

9,270
5
47
61

Thanks for answer Steve! Recenetly I also thought about extending miss manners benchmark through adding guests, do U think it is not worth an effort? Maybe U can give me some advice if I would decide to create data myself. – gadon Nov 09 '13 at 16:53
It all depends on what you're trying to achieve with a benchmark. What does it actually tell you? I wouldn't be surprised if there 's already plenty of research out there on what happens if you add more guests to Miss Manners. There's also plenty out there on how to cheat it, as a number of rules engines have been known to optimise for the benchmarks. – Steve Nov 09 '13 at 17:11
1

Here's one example of more guests: http://blog.athico.com/2009/05/miss-manners-2009-yet-another-drools.html – Steve Nov 09 '13 at 17:13

Where can I get large number of rules and facts or how can I generate them for Drools benchmark?

1 Answers1