As far as I can tell, your queries are correct, and the problem you observe is a bug in whatever SPARQL engine you're using. Or at least: when I tried your case on a Sesame store (version 2.8.8), it gave me the expected result.
EDIT the reason I got correct results is that Sesame just happens to return results in the expected order, but as @TallTed correctly remarked, this is not actually enforced by the query, so it's not something you can depend on. So my earlier assertion that this is a bug in the endpoint is wrong.
Let's explore this a bit.
Data I used:
@prefix : <http://example.org/> .
:a-gene a :Gene .
:b-gene a :Gene .
:c-gene a :Gene .
:d-gene a :Gene .
:strain1 a :Strain .
:strain2 a :Strain .
:strain1 :hasGene :a-gene .
:strain1 :hasGene :b-gene .
:strain1 :hasGene :d-gene .
:strain2 :hasGene :b-gene .
:strain2 :hasGene :d-gene .
If we look at the simplest form of the query, we want back all ?s
and all ?g
for which optionally there is a :hasGene
relation, and we want them in order. Your initial query was basically this:
PREFIX : <http://example.org/>
select ?s ?g
where { ?g a :Gene.
?s a :Strain.
optional { ?s ?hasGene ?g } .
}
order by ?s
Now, this query, in my Sesame store (and your endpoint as well), returns this:
?s ?g
<http://example.org/strain1> <http://example.org/a-gene>
<http://example.org/strain1> <http://example.org/b-gene>
<http://example.org/strain1> <http://example.org/c-gene>
<http://example.org/strain1> <http://example.org/d-gene>
<http://example.org/strain2> <http://example.org/a-gene>
<http://example.org/strain2> <http://example.org/b-gene>
<http://example.org/strain2> <http://example.org/c-gene>
<http://example.org/strain2> <http://example.org/d-gene>
Looks good right? All in alphanumeric order. But it's important to realize that the ordering of the ?g
column here is coincidence. If the engine had returned this instead:
?s ?g
<http://example.org/strain1> <http://example.org/c-gene>
<http://example.org/strain1> <http://example.org/b-gene>
<http://example.org/strain1> <http://example.org/a-gene>
<http://example.org/strain1> <http://example.org/d-gene>
<http://example.org/strain2> <http://example.org/b-gene>
<http://example.org/strain2> <http://example.org/a-gene>
<http://example.org/strain2> <http://example.org/c-gene>
<http://example.org/strain2> <http://example.org/d-gene>
...it would also have been a valid result - after all, nowhere does our query say that ?g
should be ordered.
The solution is straightforward, however: order on both ?s
and ?g
. Since in our particular SPARQL endpoint the correct order is already "coincidentally" returned even without this, we can verify that it works with a little trick: revert the order, using the DESC
operator.
Query:
PREFIX : <http://example.org/>
SELECT ?s ?g
WHERE { ?g a :Gene.
?s a :Strain.
OPTIONAL { ?s ?hasGene ?g } .
}
ORDER BY ?s DESC(?g)
Result:
?s ?g
<http://example.org/strain1> <http://example.org/d-gene>
<http://example.org/strain1> <http://example.org/c-gene>
<http://example.org/strain1> <http://example.org/b-gene>
<http://example.org/strain1> <http://example.org/a-gene>
<http://example.org/strain2> <http://example.org/d-gene>
<http://example.org/strain2> <http://example.org/c-gene>
<http://example.org/strain2> <http://example.org/b-gene>
<http://example.org/strain2> <http://example.org/a-gene>
You can see the ?g
column is now actually ordered in reverse alphabetical (this is of course the reverse of what you wanted, but that's easily corrected by just leaving out the DESC
part of the query later - the point is that this way we have verified that it's our query doing the ordering, not whatever endpoint we are using).
It still won't fully solve the problem of the ordering in your binary string though. Since in your original query the BIND
takes place before ordering (because the bind is part of the graph pattern, which gets fully evaluated before result ordering occurs), the ORDER BY
clause has no influence on it. That is, if we simply do this query:
PREFIX : <http://example.org/>
SELECT ?s (GROUP_CONCAT(?result ; SEPARATOR="") as ?binary)
WHERE { ?g a :Gene.
?s a :Strain.
OPTIONAL { ?s ?hasGene ?g } .
BIND((IF(BOUND(?hasGene), "1","0")) AS ?result).
}
GROUP BY ?s
ORDER BY ?s DESC(?g)
We still get back this result:
?s ?binary
<http://example.org/strain1> "1101"
<http://example.org/strain2> "0101"
In other words, our binary string is still not inverted, as it should be.
The solution is to introduce a further subquery, which delivers the results needed in order to its outer query, which then concatenates this ordered result to create the binary string, like so:
PREFIX : <http://example.org/>
SELECT ?s (GROUP_CONCAT(?result ; SEPARATOR="") as ?binary)
WHERE {
{ SELECT ?s ?hasGene
WHERE { ?g a :Gene.
?s a :Strain.
OPTIONAL {?s ?hasGene ?g.}.
}
ORDER BY ?s DESC(?g)
}
BIND((IF(BOUND(?hasGene), "1","0")) AS ?result).
}
GROUP BY ?s
The result of this is:
?s ?binary
<http://example.org/strain1> "1011"
<http://example.org/strain2> "1010"
As you can see, the correct (inverted) binary string is now enforced by the query. We then need to feed this entire beast into the CONSTRUCT
query you wanted, and we finally need to take out that inversion of the binary string.
The full query then becomes this:
Query 2:
PREFIX : <http://example.org/>
CONSTRUCT {?s :hasBinary ?binary }
WHERE {
SELECT ?s (GROUP_CONCAT(?result ; SEPARATOR="") as ?binary)
WHERE {
{ SELECT ?s ?hasGene
WHERE { ?g a :Gene.
?s a :Strain.
OPTIONAL {?s ?hasGene ?g.}.
}
ORDER BY ?s ?g
}
BIND((IF(BOUND(?hasGene), "1","0")) AS ?result).
}
GROUP BY ?s
}
Result:
<http://example.org/strain1> <http://example.org/hasBinary> "1101" .
<http://example.org/strain2> <http://example.org/hasBinary> "0101" .