0

One of the optimizations performed by JenaARQ is to: "Place filters close to where their dependency variables are defined".

This causes the following Query Plan:

  (filter (exprlist (|| (|| (isIRI ?Y) (isBlank ?Y)) (!= (datatype ?Y) <http://example.com/onto/rdf#structure>)) (|| (|| (isIRI ?Z) (isBlank ?Z)) (!= (datatype ?Z) <http://example.com/onto/rdf#structure>)))
    (bgp
      (triple ?X <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://swat.cse.lehigh.edu/onto/univ-bench.owl#Student>)
      (triple ?Y <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://swat.cse.lehigh.edu/onto/univ-bench.owl#Faculty>)
      (triple ?Z <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://swat.cse.lehigh.edu/onto/univ-bench.owl#Course>)
      (triple ?X <http://swat.cse.lehigh.edu/onto/univ-bench.owl#advisor> ?Y)
      (triple ?Y <http://swat.cse.lehigh.edu/onto/univ-bench.owl#teacherOf> ?Z)
      (triple ?X <http://swat.cse.lehigh.edu/onto/univ-bench.owl#takesCourse> ?Z)
    )))

To be transformed into the following:

  (sequence
    (filter (|| (|| (isIRI ?Z) (isBlank ?Z)) (!= (datatype ?Z) <http://example.com/onto/rdf#structure>))
      (sequence
        (filter (|| (|| (isIRI ?Y) (isBlank ?Y)) (!= (datatype ?Y) <http://example.com/onto/rdf#structure>))
          (bgp
            (triple ?X <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://swat.cse.lehigh.edu/onto/univ-bench.owl#Student>)
            (triple ?Y <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://swat.cse.lehigh.edu/onto/univ-bench.owl#Faculty>)
          ))
        (bgp (triple ?Z <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://swat.cse.lehigh.edu/onto/univ-bench.owl#Course>))))
    (bgp
      (triple ?X <http://swat.cse.lehigh.edu/onto/univ-bench.owl#advisor> ?Y)
      (triple ?Y <http://swat.cse.lehigh.edu/onto/univ-bench.owl#teacherOf> ?Z)
      (triple ?X <http://swat.cse.lehigh.edu/onto/univ-bench.owl#takesCourse> ?Z)
    )))

It turns out that while the original query plan runs in milliseconds the "optimized" query plan takes about 7 hours to be concluded.

Does JenaARQ consider any statistics for optimizing the filter placement in the query plan?

I'm using Jena 3.12.0.

  • 1
    What storage layer is this running over? (I would guess TDB) It looks like the issue is that the BGP is being broken up in a less than idea way. There are two competing optimizations: placing filters and reordering basic graph patterns. You can explore this by reordering the basic graph pattern - put the 3 rdf:type triple patterns at the end. There are statisitics but the nature of the filter (is it highyl selective or just a check for a few odd cases) makes it a hard problem to choose whether a filter is better than a reorder. – AndyS Jan 16 '20 at 14:02
  • Yes, I am using TDB1 and TDB2 for comparison. Both presented similar behavior. I was able to reduce query response time in orders of magnitude by setting the following option in the dataset assembler. ```:ja:context [ ja:cxtName "arq:optFilterPlacement" ; ja:cxtValue "false" ] ;``` – Elton Soares Jun 20 '20 at 04:48
  • As I'm using queries from a public benchmark I'd like to be able to optimize the performance without changing the original queries. – Elton Soares Jun 20 '20 at 04:52
  • "Jena 3.12.0." - maybe later versions do better. – AndyS Jun 20 '20 at 13:53
  • It is a pragmatic choice whether to push filters in or to use the fact the triple pattern will eliminate possibilities. Turning an optimization off if it does not work is one option. – AndyS Jun 20 '20 at 13:55

0 Answers0