Difference in performance between using VALUES keyword and using directly the URI in the query?

Question

I have a fairly complex SPARQL query with the structure outlined below, involving multiple graph patterns, UNION and nested FILTER NOT EXISTS.

I want the query to remain generic, and I want to be able to inject values for certain variables at execution time, and my idea is to append a VALUES keyword at the end of the query to specify the value of certain variables in the query. In the structure below, I set the value of ?x, and I illustrate all the places in the query where ?x applies.

However, in Fuseki I see that executing the query like that takes around 4 to 5 seconds, but manually replacing the ?x variable in the query with a URI, instead of specifying a VALUES clause, makes it run very fast.

I always thought that using the VALUES keyword at the end of the WHERE clause was like setting values inline for some variables, so I would expect using the VALUES clause or replacing the variables with their corresponding URI was the same in terms of query execution. Can someone confirm the expected behavior of the VALUES keyword? also explain the difference between using it outside of the WHERE clause or inside of the WHERE clause ?
Does the fact that the variable set using VALUES appears in FILTER NOT EXISTS clause change something?
Can you confirm this is the correct approach for the requirement above (I want the query to remain generic and I want to be able to inject values for certain variables at execution time)?
Is it possible that this behavior is specific to how Fuseki handles VALUES?

Thanks !

SELECT DISTINCT ...
WHERE {
    # ?x ...
    # ... basic graph pattern here 

    {
      {
        # ... basic graph pattern here 

        FILTER NOT EXISTS {
            # ?x ...
            # ... basic graph pattern here
        }

        FILTER NOT EXISTS {
            # ... basic graph pattern here
            FILTER NOT EXISTS {
                # ?x ...
                # ... basic graph pattern here
            }
        }       
      }
      UNION
      {
        ?x ...
        # ... basic graph pattern here
      }
      UNION
      {
        # ... basic graph pattern here

        FILTER NOT EXISTS {
            ?x ...
            # ... basic graph pattern here
        }

        FILTER NOT EXISTS {
            # ... basic graph pattern here
            FILTER NOT EXISTS {
                ?x ...
                # ... basic graph pattern here
            }
        }
      }
      UNION
      {
        ?x ...
      }
    }
}
VALUES ?x { <http://example.com/Foo> }

score 3 · Accepted Answer · answered Jul 04 '19 at 17:44

Not supposed to be an answer, but formatting in comments is impossible...

There is at least some obvious difference in the algebra tree. How this is handled is probably implementation specific. Andy knows better and hopefully give a more useful answer than mine.

without `VALUES`:

Query

SELECT  ?s ?o
WHERE
  {   { <test_val>  <p>  ?o }
    UNION
      { <test_val>  <p>  ?o
        FILTER NOT EXISTS { <test_val>  a                   ?type }
      }
  }

Algebra tree (optimized)

(base <http://example/base/>
  (project (?s ?o)
    (union
      (bgp (triple <test_val> <p> ?o))
      (filter (notexists (bgp (triple <test_val> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?type)))
        (bgp (triple <test_val> <p> ?o))))))

with `VALUES`

Query

SELECT  ?s ?o
WHERE
  {   { ?s  <p>  ?o }
    UNION
      { ?s  <p>  ?o
        FILTER NOT EXISTS { ?s  a                     ?type }
      }
  }
VALUES ?s { <test_val> }

Algebra tree

(base <http://example/base/>
  (project (?s ?o)
    (join
      (union
        (bgp (triple ?s <p> ?o))
        (filter (notexists (bgp (triple ?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?type)))
          (bgp (triple ?s <p> ?o))))
      (table (vars ?s)
        (row [?s <test_val>])
      ))))

Algebra tree(optimized)

(base <http://example/base/>
  (project (?s ?o)
    (sequence
      (table (vars ?s)
        (row [?s <test_val>])
      )
      (union
        (bgp (triple ?s <p> ?o))
        (filter (notexists (bgp (triple ?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?type)))
          (bgp (triple ?s <p> ?o)))))))

VALUES at the end is "like setting variables" but isn't the same. The optimizer tries to push the values in but that can't happen in all cases as it changes the semantics. In a complex query, there is a higher chance of a blocking pattern even if knowing the data, you know the changing semantics does not happen.There are two things to try: (1) put the VALUES where you mean them to be next to the variables they influence, (2) class QueryTransformOps which rewrites queries based on a map of variable to value. — AndyS, Jul 04 '19 at 21:06
Thank you this is really useful to know. As this feature is key to what I am doing, I really need to understand the cases where using VALUES is different from replacing the variable with a URI in the query, and will ask a separate question for this. Using QueryTransformOps could help, although I could as well do a RegEx search/replace. I can't put a VALUES everywhere where it is required in the query, as the query string is externalized in a file, and values are set at runtime depending on the context. — ThomasFrancart, Jul 05 '19 at 07:39

Difference in performance between using VALUES keyword and using directly the URI in the query?

1 Answers1

without `VALUES`:

Query

Algebra tree (optimized)

with `VALUES`

Query

Algebra tree

Algebra tree(optimized)

Linked

Difference in performance between using VALUES keyword and using directly the URI in the query?

1 Answers1

without VALUES:

Query

Algebra tree (optimized)

with VALUES

Query

Algebra tree

Algebra tree(optimized)

Linked

without `VALUES`:

with `VALUES`