How to filter down a large Jena Model in TDB

Question

I have a large RDF model that doesn't fit in memory. I am currently loading the entire thing into TDB, but I would like to instead filter it down by focusing on only a subgraph (all properties about all resources which are subclassof or type of some "root" concept).

What I have tried is to execute a DESCRIBE statement against the full TDB model which describes the subset of the graph I am interested in ({ ?x rdf:type/rdfs:subClassOf* ?type }). The problem I have is twofold:

On a smaller [sample] dataset, the DESCRIBE statement completes, but I can't figure out how to write the resulting Model back into the TDB (I want to throw away all the other data). I tried to call tdbModel.setDefaultModel() but it throws exception. So, what I am doing now is to create a second TDB location, get the default model, and then add the result of the DESCRIBE statement into this other model. Is there a better way?
On the full dataset, I think the DESCRIBE statement would result in over 500k triples and its been running for a couple hours without completion. Is there a more efficient way to do this?

I don't know whether it will make the query faster or slower, but rather than using a DESCRIBE (which, by default, lets Jena decide what counts as a description), have you tried doing this with a CONSTRUCT query instead? — Joshua Taylor, Jan 03 '14 at 22:56

How to filter down a large Jena Model in TDB

0 Answers0