0

I am playing with Blazegraph. I insert some triples representing 'events', each of 'event' contains 3 triples and looks like this:

<%event-iri%> <http://predicates/timestamp> '2020-01-02T03:04:05.000Z'^^xsd:dateTime .
<%event-iri%> <http://predicates/a> %RANDOM_UUID% .
<%event-iri%> <http://predicates/b> %RANDOM_UUID% .

Timestamps represent consecutive moments of time, each next event is 1 minute later than the previous one.

I have 10 million 'events' (so 30 million triples) in the graph.

I run the following query:

select ?event
where {
  ?event <http://predicates/timestamp> ?timestamp .
}
order by ?timestamp
limit 15

I expect it to be executed effectively using POS index for ordering, but it looks like it attempts to do a sort of all 10 million timestamps in memory as the query takes more than 2 minutes to execute.

Adding a hint saying that a range query is safe does not help either (well, the hint is about range queries, but I decided to still try it as it seems to actually be about 'all objects of this predicate are of the same type'):

select ?event
where {
  ?event <http://predicates/timestamp> ?timestamp .
  hint:Prior hint:rangeSafe true .
}
order by ?timestamp
limit 15

Still same 2 minutes and 20 seconds.

In relational databases and MongoDB you just add an index and such queries work fast.

Is there a way to execute such a query efficiently on Blazegraph?

Roman Puchkovskiy
  • 11,415
  • 5
  • 36
  • 72
  • did you try to add a filter anyways? Without it's hard for the query planner to use a push down to the index because of SPARQL semantics. Just try to add a large range filter, maybe this helps? Depending on the index on this datatype does exist. Otherwise, sorting is expensive given that the 10 million entries will sorted in-memory with pressure on the JVM heap. – UninformedUser Nov 28 '20 at 13:31
  • @UninformedUser thank you for the suggestion. I tried adding `filter (?timestamp >= '2000-06-01T16:46:34.801+04:00'^^xsd:dateTime && ?timestamp < '2100-06-02T09:26:34.801+04:00'^^xsd:dateTime)`, but it is still 2 minutes 21 second. The index actually exists because I actually get very fast range queries on it (if they match just a few results), like here https://stackoverflow.com/questions/65000453/does-blazegraph-support-range-query-optimization – Roman Puchkovskiy Nov 28 '20 at 14:55
  • Try something like this:https://w.wiki/ofU :-) – Stanislav Kralin Nov 29 '20 at 13:09
  • Thank you @StanislavKralin, but I don't see `ORDER BY` in your query. Is the result guaranteed to be ordered by ... any criteria? – Roman Puchkovskiy Nov 29 '20 at 16:34

0 Answers0