3

This seems like it should be a simple/common thing, but I haven't found any useful answers that work.

I want to get a page of results (using OFFSET and LIMIT) as well as a COUNT of total results in the same query, without repeating or re-running the query. In other words, I want to run the query once, count the results, get the first n results after some offset, and return just those n results along with the total result count. The exact format for how this is returned doesn't matter; I just need the data.

The closest answer I've found was in How to get total number of results when using LIMIT in SPARQL?, but the solution boils down to "duplicate the WHERE clause in two subqueries", which seems unnecessary (and runs the query twice(?)).

I suspect this can be done with some combination of subqueries and possibly a UNION, but I'm new to SPARQL so my grasp on its semantics isn't very firm yet.

A blatantly invalid example that illustrates what I want to accomplish (but not how I intend to do it):

SELECT (?id OFFSET 5 LIMIT 10 AS ?pageOfResults) (COUNT(?id) AS ?totalResults)
WHERE {
    ?id some:predicate some:object
    ORDER BY ?id
}

The closest I've gotten is embodied by the next two examples. First, one which gives the desired result set (in this case, an extra result that contains the count). This is based on the link above. As noted above, it does so by duplicating the WHERE clause (effectively running the same query twice unless I misunderstand how SPARQL works), which I want to avoid:

SELECT ?id ?count
WHERE {
    {
        SELECT (COUNT(*) as ?count)
        WHERE {
            ?id some:predicate some:object .
        }
    }
    UNION
    {
        SELECT ?id
        WHERE {
            ?id some:predicate some:object .
        }
        ORDER BY ?id
        OFFSET 5 
        LIMIT 10
    }
}

Next, one which comes close to what I want, but which always returns a ?count of 1 (presumably because it's counting the ?ids being grouped instead of counting all of the matches). I was trying to get (and COUNT) all of the matches first before passing the ?id up to the outer layer to get OFFSET and LIMITed (and that part seems to work).

SELECT ?id ?count
{
    {
        SELECT ?id (COUNT(*) as ?count)
        WHERE {
            ?id some:predicate some:object .
        }
        GROUP BY ?id
        ORDER BY ?id
    }
}
OFFSET 5
LIMIT 10

It would be nice (for this and other purposes) to be able to store the result of the WHERE clause in a variable and then do two separate SELECTs on it (one for the page of results, one for the count), but if that's possible, I haven't seen a way to do it.

ce-nate
  • 31
  • 1
  • 1
    Blazegraph (and probably AnzoGraph) specific: https://wiki.blazegraph.com/wiki/index.php/NamedSubquery – Stanislav Kralin Jan 03 '20 at 21:38
  • 1
    There is no standard way in SPARQL 1.1 - the closest is what you did in query 2 indeed. But I don't see why this query would be a problem. It's part of the query optimizer to optimize those things. And for an application, the better option would be to get the total count in a separate query, you don't want to compute the total count for each page – UninformedUser Jan 04 '20 at 04:16
  • You didn't say what engine and/or endpoint you're querying. This detail may lead to more engine/endpoint answers... – TallTed Jan 07 '20 at 18:16
  • @TallTed We're using Amazon Neptune. [pages like this](https://db-engines.com/en/system/Amazon+Neptune%3BBlazegraph) suggest that it uses BlazeGraph, but I haven't been able to get anything other than a generic 500 error when attempting to use named subqueries as Stanislav suggested above. For now I've decided to just have my code inject the query twice, since that seems to be the most reliable solution. – ce-nate Jan 08 '20 at 18:58

0 Answers0