Any way to inference with Jena without load all data into memory?

Question

I have a great deal of RDF data to be inferred, and I need to develop my own inference rules. My question is that whether there is any method to do this? Is it ok to use Jena rule and SPARQL to do so? Do the Jena rule and sparql query have to load all data into memory? Wish to get answers soon and thanks in advance!

score 3 · Answer 1 · answered May 11 '12 at 07:54

The Jena inference engines definitely work best when all of the data is loaded into memory. This is because, while it is possible to connect an inference engine to any Jena Model, including a persistent store such as TDB, the inference algorithms make many calls to check the presence or absence of a particular triple in the model. This just gets inefficient when that check requires hitting disk.

If you have relatively straightforward inference requirements, you may be able to express your entailments via carefully constructed SPARQL queries, in which case you can probably get away with querying a TDB or SDB store directly. It just depends on the complexity of the query.

If the contents of your triple store are reasonably stable, or can be partitioned into a stable, persistent set and a dynamic in-memory set, then a strategy is to pre-compute the inference closure and store that in a persistent store. The classic space/time trade-off, in other words. There are two ways to do this: first, use the inference engine with an in-memory store using as much heap space as you can give it; second use Jena's RIOT infer command-line script.

For large-scale RDF inferencing, an alternative to the open-source Jena platform is the commercial product Stardog from Clark & Parsia (which I believe has a Jena model connector, but I haven't used it myself).

Does that mean it is possible to use Jena to infer without load all data into memory?My rdf data is skos ,so the RIOT can not infer with it.At the beginning ,the data will change frequently as more data will be add into,but after that ,it will be relatively stable. — Wang Ruiqi, May 12 '12 at 15:24
As far as I remember, none of the built-in Jena reasoners have SKOS inference in any case. I assume you mean things like transitivity of skos:broader, etc? In this case, you're going to have to write some custom code - at the very least, custom inference rules. However, if your main concern is to be able to stream SKOS processing, you might be best to build your own streaming code to handle the input, and which constructs a SKOS concepts index so you do the transitive closure incrementally yourself. — Ian Dickinson, May 13 '12 at 23:24

score 1 · Answer 2 · answered May 12 '12 at 08:13

In addition to what Ian said and depending on your rules, if materializing all inferred triples is feasible in a streaming fashion in your case, have a look at RIOT's infer source code and, if you need more than RDFS, think how you might add support for a subset of OWL. You find the source code here:

The approach of RIOT's infer command can also be used with MapReduce, you can find an example here:

https://github.com/castagna/tdbloader4/blob/f5363fa49d16a04a362898c1a5084ade620ee81b/src/main/java/org/apache/jena/tdbloader4/InferDriver.java

As my English is poor,I wish I can make it clear.If I use Jena to load a model from TDB,does that mean all data of that model will be loaded into memory? — Wang Ruiqi, May 12 '12 at 15:18
And if I use Jena to execute 'select' sparql query,for example: QueryExecution qexec = QueryExecutionFactory.create(sparqlQueryString, dataset);Which parameter is a dataset,does that mean this will not load all data into memory? — Wang Ruiqi, May 12 '12 at 15:21

Any way to inference with Jena without load all data into memory?

2 Answers2