Jena TDB/Fuseki Performance

Question

I have a simple SPARQL query which executes reasonably fast on my Jena TDB store using a local Fuseki SPARQL endpoint:

SELECT DISTINCT ?p 
WHERE
{ 
  ?s rdf:type dbpedia-owl:Organisation .
  ?s ?p dbpedia:California .
}
LIMIT 10

It takes maybe 10 seconds to complete and contains a few owl:ObjectProperty's and other properties. When I want to show only an object property using the following query (note the additional triple and the limit of 1 at the end):

SELECT DISTINCT ?p 
WHERE
{ 
  ?s rdf:type dbpedia-owl:Organisation .
  ?s ?p dbpedia:California .
  ?p a owl:ObjectProperty .
}
LIMIT 1

then I would expect the answer to appear just as quickly, and to show only one of the object properties previously shown. After all, it's only a further refinement of the previous query. However, the query takes many times longer, and finishes after several minutes instead of several seconds.

I am puzzled here. Why does the second query take so much longer?

I am using Fuseki version 1.1.0. Here's my fuseki configuration file:

@prefix rdf:     <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs:    <http://www.w3.org/2000/01/rdf-schema#> .
@prefix tdb:     <http://jena.hpl.hp.com/2008/tdb#> .
@prefix ja:      <http://jena.hpl.hp.com/2005/11/Assembler#> .
@prefix text:    <http://jena.apache.org/text#> .
@prefix fuseki:  <http://jena.apache.org/fuseki#> .
@prefix foaf:    <http://xmlns.com/foaf/0.1/> .
@prefix :        <http://localhost/dbpedia37#> .

[] rdf:type fuseki:Server ;
   fuseki:services (
     :service_text_tdb
   ) .

## Example of a TDB dataset and text index
## Initialize TDB
[] ja:loadClass "com.hp.hpl.jena.tdb.TDB" .
tdb:DatasetTDB  rdfs:subClassOf  ja:RDFDataset .
tdb:GraphTDB    rdfs:subClassOf  ja:Model .

## Initialize text query
[] ja:loadClass       "org.apache.jena.query.text.TextQuery" .
# A TextDataset is a regular dataset with a text index.
text:TextDataset      rdfs:subClassOf   ja:RDFDataset .
# Lucene index
text:TextIndexLucene  rdfs:subClassOf   text:TextIndex .

## ---------------------------------------------------------------
## This URI must be fixed - it's used to assemble the text dataset.

:text_dataset rdf:type     text:TextDataset ;
    text:dataset   :dataset ;
    text:index     :indexLucene ;
    .

# A TDB datset used for RDF storage
:dataset rdf:type      tdb:DatasetTDB ;
    tdb:location "/Users/jsimon/No-Backup/dbpedia37/tdb" ;
#    tdb:unionDefaultGraph true ; # Optional
    .

# Text index description
:indexLucene a text:TextIndexLucene ;
    text:directory <file:Lucene> ;
    ##text:directory "mem" ;
    text:entityMap :entMap ;
    .

# Mapping in the index
# URI stored in field "uri"
# rdfs:label is mapped to field "text"
:entMap a text:EntityMap ;
    text:entityField      "uri" ;
    text:defaultField     "text" ;
    text:map (
         [ text:field "text" ; text:predicate rdfs:label ]
         [ text:field "text" ; text:predicate foaf:name ]
         ) .

:service_text_tdb rdf:type fuseki:Service ;
    rdfs:label                      "TDB/text service" ;
    fuseki:name                     "ds" ;
    fuseki:serviceQuery             "query" ;
    fuseki:serviceQuery             "sparql" ;
    fuseki:serviceUpdate            "update" ;
    fuseki:serviceUpload            "upload" ;
    fuseki:serviceReadGraphStore    "get" ;
    fuseki:serviceReadWriteGraphStore    "data" ;
    fuseki:dataset                  :text_dataset ;
    .

Does it repeatedly take that long? E.g., if you run the query several times in the same session? (I don't know whether there's a "warmup period" or anything like that that might come into play.) — Joshua Taylor, Aug 26 '14 at 16:51
Since you're only selecting one in the second query, what happens if you remove `distinct`? — Joshua Taylor, Aug 26 '14 at 16:57
Unfortunately it takes just as long. When I remove the object property constraint and increase the limit, the result is almost instantaneously there, and it does include object properties. — Johannes, Aug 26 '14 at 17:01
Which version of Fuseki is this? And what's the configuration? — AndyS, Aug 26 '14 at 22:04
+1 to @AndyS's comment, the fact that adding a new triple mentioning an OWL property implies your Fuseki configuration might include reasoning which can have a significant performance impact — RobV, Aug 27 '14 at 08:31
Good idea, though I don't see that my configuration includes any reasoning. I added it to my post above. — Johannes, Aug 27 '14 at 10:17
But it more complicated than a plain dataset. Try either the latest development build - there was a possible issue around here which is fixed - or remove the text index for a test of the same query and see if that helps. Also REDUCED not DISTINCT might be useful information. — AndyS, Aug 27 '14 at 11:53

Jena TDB/Fuseki Performance

0 Answers0