3

Try this query with and without the limit clause at the end on Wikidata endpoint.

With LIMIT clause here, and without LIMIT clause here here.

Now see the difference... I think that the reason for this is the ?duration variable in the projection of the first subquery, which indeed has no bindings and not in the domain of the solution mapping. Now I think here there is definitely a bug in Blazergraph. But the question anyways is: if we project on a variable that doesn't exist in the domain of the solution, and then use the variable for joining (as with ?duration in the example), what should the behaviour be? Ignore the variable or treat it as unbound variable?

SELECT   ?film ?duration
WHERE
{         
    {
      select ?film ?duration
      where
         {?film   <http://www.wikidata.org/prop/direct/P31>  <http://www.wikidata.org/entity/Q11424>.}
    }

    {
      select ?film ?duration     
      where
         {?film   <http://www.wikidata.org/prop/direct/P2047>  ?duration .}
    }      
}
#limit 1000 
Median Hilal
  • 1,483
  • 9
  • 17

1 Answers1

3

Is there a workaround that makes LIMIT work?

Yes. If you remove ?duration from the SELECT clause of the first subquery, then the query works with LIMIT.

Is there a bug in Blazegraph?

Yes. Removing ?duration should not change the result of the query, but it obviously changes the result if LIMIT is present.

We know that ?duration is unbound in all solutions of the first subquery, regardless of whether we removed ?duration from the SELECT clause. So the only difference between the two queries is whether the variable is in scope or not. And the definition of SPARQL's join operation does not refer to variable scope at all. It just depends on the variables that are actually bound in solutions. So, changing what variables are in scope is not supposed to change the result of the query.

If we project on a variable that doesn't exist in the domain of the solution, and then use the variable for joining (as with ?duration in the example), what should the behaviour be?

The variable should be treated as always unbound, but in scope. This means:

  • bound(?var) should always be false
  • SELECT * should include a ?var column that is always empty
  • SELECT ... ("xxx" AS ?var) should result in a syntax error because ?var is already in scope
cygri
  • 9,412
  • 1
  • 25
  • 47
  • 1. Solution Mapping A solution mapping, μ, is a partial function μ : V -> RDF-T. The domain of μ, dom(μ), is the subset of V where μ is defined. – Median Hilal Jun 06 '19 at 18:16
  • 2. Project(Ψ, PV) = [ Proj(Ψ[μ], PV) | μ in Ψ ] – Median Hilal Jun 06 '19 at 18:17
  • 3. Compatible Mappings Two solution mappings μ1 and μ2 are compatible if, for every variable v in dom(μ1) and in dom(μ2), μ1(v) = μ2(v). – Median Hilal Jun 06 '19 at 18:17
  • 4. Join(Ω1, Ω2) = { merge(μ1, μ2) | μ1 in Ω1and μ2 in Ω2, and μ1 and μ2 are compatible } – Median Hilal Jun 06 '19 at 18:17
  • A variable belongs to the domain of a solution if and only if it is bound in the solution. Those mean the same thing. – cygri Jun 06 '19 at 19:27
  • “Solution” in SPARQL is a technical term and synonymous with “solution mapping”. Each “row” in a SPARQL SELECT result is a solution. The entire “table” of results is a “solution sequence”. You speak about “the domain of the solution of the sub-select”. There seems to be some confusion—a sub-select will have zero or more solutions, and there is no such thing as “the” solution. – cygri Jun 06 '19 at 19:31
  • Solutions have a domain. BGPs or other graph patterns do not have a domain (at least not in the SPARQL spec—so if you want to talk about the domain of a BGP, you need to say what you mean by that). That's because each of the zero or more solutions of the graph pattern could have a different domain—there may be different variables bound in the solution. – cygri Jun 06 '19 at 19:32
  • But BGPs and other graph patterns have a set of variables that are “in scope” of the pattern. For example, the pattern `VALUES ?x { UNDEF }` has one solution, and the solution's domain is empty (it has no bound variables), but there is one variable `?x` in scope for the pattern. Same for `?duration` in the first subquery: It will never be in the domain of any of the solutions of the subquery (in other words, it is always unbound), but it is in scope for the subquery. – cygri Jun 06 '19 at 19:36
  • Here is a clear version of my comments: From 4 and 3, join cares about solutions that are compatible, i.e., (from 3), agree on values of shared variables, where shared variables belong to the domain of both of the solutions. Hence, it is about (belonging to the domain of a solution), rather that (bound or not). So I think the question is: does ?duration belong to the domain of the solutions resulting from the Project in the first subselect? (Obviously it does not belong to the domain of the solution of underlying BGP (consider all variables in the BGP)) – Median Hilal Jun 06 '19 at 19:38
  • Let's break it down to this: SPARQL specification says: A solution mapping, μ, is a partial function μ : V -> RDF-T. The domain of μ, dom(μ), is the subset of V where μ is defined. The question is: Does μ being defined for a variable x, means that the x is bound? – Median Hilal Jun 06 '19 at 19:41
  • A variable v is bound in a solution μ if and only if v ∈ dom(μ). – cygri Jun 06 '19 at 19:43
  • 1
    I think you are right... From function definition, μ(x) is defined if it has a value, which means it's bound – Median Hilal Jun 06 '19 at 19:44
  • Then what comes to my mind: is it legal in SPARQL to project on a non-existing variable? I think the specification does not mention that. For example Virtuso complains about it, while Blazergraph and Jena do not. While graphDB does not complain about it but joins using the unbound variable and hence the join yields nothing. – Median Hilal Jun 06 '19 at 19:57
  • Yes, it is legal to project on a variable that is “non-existing” (in other words, it is not in scope for the query pattern). Projection allows any variable, and the spec has no restriction that the variable must be in scope. Virtuoso’s behaviour is non-standard. GraphDB sounds like a bug but I have not seen it’s behaviour first-hand. – cygri Jun 07 '19 at 07:54