1

Take this graph:

:thing1 a :Foo ;
    :has :A ;
    :has :B .

:thing2 a :Foo ;
    :has :B ;
    :has :A .

:thing3 a :Foo ;
    :has :A ;
    :has :B ;
    :has :C .

I want to select :thing1 and :thing2, but NOT :thing3.

Here is the SPARQL query I wrote that works. Is there a better way to do this?

SELECT ?foo WHERE {
    ?foo a :Foo ;
        :has :A ;
        :has :B .
    MINUS {
        ?foo a :Foo ;
            :has :A ;
            :has :B ;
            :has ?anythingElse .
        FILTER(?anythingElse != :A && ?anythingElse != :B)
    }
}
Bhargav Rao
  • 50,140
  • 28
  • 121
  • 140
Blake Regalia
  • 2,677
  • 2
  • 20
  • 29

2 Answers2

3

An alternative to MINUS is FILTER NOT EXISTS:

SELECT ?foo WHERE {
    ?foo a :Foo ;
        :has :A, :B .
   FILTER NOT EXISTS {
       ?foo :has ?other .
       FILTER (?other NOT IN (:A, :B))
    }
}

which says, loosely, find all ?foo with :A and :B, then check that they have no other :has value.

In terms of execution efficiency, there are optimizations to turn some MINUS patterns into FILTER NOT EXISTS and vice versa and also there is the possibility of shared common sub patterns.

Without an optimizer being that smart, the FILTER NOT EXISTS is likely to be faster because the "?foo a :Foo ; ;has :A, :B ." is not repeated and the FILTER only considers items that already passed the "?foo a :Foo ; ;has :A, :B .".

There is only one way to know which is to try for real on real data when all effects, including caching, come together.

AndyS
  • 16,345
  • 17
  • 21
  • I do believe this is in fact better because the minus block from the original post will search the entire graph, whereas filter not exists will only search (and then remove) matching triples from the results of the outer block. In other words, this solution should have better performance since it removes the triples that `:has ?other` from the subset of triples that `:has :A, :B`. Would you mind adding this explanation into your solution? Then i'd be happy to accept it as the answer :) – Blake Regalia Mar 16 '15 at 18:34
  • It's a little more complicated than that :-). An optimizer could detect that the MINUS form can be transformed into a FILTER NOT EXIST form, or it can see that the MINUS block can share computation with the main part and exploit that. However, they are tricky optimizations (they don't always work) and i don't believe many systems attempt them. In ARQ, the FILTER NOT EXISTS is form is faster for this case where the MINUS has lots of pattern in common. – AndyS Mar 17 '15 at 19:23
0

You can do this using using the NOT IN operator instead of a boolean expression, and indeed there is no need to repeat the three triple patterns if you replace the MINUS clause with a FILTER NOT EXISTS clause:

SELECT ?foo WHERE {
    ?foo a :Foo ;
        :has :A, :B .
    FILTER NOT EXISTS {
       ?foo :has ?other .
       FILTER (?other NOT IN (:A, :B))
    }
}

I doubt there'll be a significant difference in performance, but the query is shorter and easier to read.

Jeen Broekstra
  • 21,642
  • 4
  • 51
  • 73
  • 2
    Shouldn't that be `NOT IN`? And if we're making the query shorter, we can abbreviate `:has :A ; :has :B` as `:has :A, :B`, too. :) – Joshua Taylor Mar 16 '15 at 02:30
  • This solution is wrong because the minus block will match all triples where `?foo :has ?other`, and then remove the ones that don't match the filter expression, and then finally subtract those from the results of the outer block. That causes `:thing3` to be returned from this query – Blake Regalia Mar 16 '15 at 03:29
  • 2
    @BlakeRegalia Yeah, it should be `NOT IN` instead of `IN`, my bad. Fixed now, see edited answer. – Jeen Broekstra Mar 16 '15 at 07:47
  • 1
    The MINUS block need "?foo a :Foo ; :has :A, :B ." because the left and right are evaluated separately, then MINUS'ed in order to catch ":thing4 :has :D" with no other :thing4 triples. – AndyS Mar 16 '15 at 10:06
  • 1
    @JeenBroekstra - I think you're overlooking my comment. The solution is fundamentally wrong because you actually do need to repeat the triple patterns in the minus block. As I said, and as @AndyS mentions, the minus block is evaluated independently - meaning that `?foo :has ?other` will match any triple with the `:has` predicate, then remove triples not matching the filter expressions, and finally subtract those from the results of the outer block – Blake Regalia Mar 16 '15 at 18:18
  • @BlakeRegalia ah of course. I originally rewrote it to use `FILTER NOT EXISTS` instead of `MINUS` and then changed it back, but overlooked this. Edited once again. – Jeen Broekstra Mar 16 '15 at 18:53