0

I am dealing with the streaming data and for that I need to apply some SPARQL type queries. For example, If I have a query like:

Select ?x,?z
FROM <http://dummyURI>
Where { ?x p1 ?y  -----(t1)
        ?x p2 ?z  -----(t2)
        ?z p3 o3  -----(t3)
     }

As shown in the query there are three triple patterns (t1, t2, and t3). In the query I observe there are some constraints on variables, i.e., ?x in (t1) should be equal to ?x in (t2) and ?z in (t2) and (t3) should be equal. In my code I can find the triples using some pattern matching corresponding to each triple pattern in query, but how should I ensure the said constraints are satisfied?

I tried to understand how SPARQL engine handles this issue, but it is not mentioned in the following standard resources (res1,res2,res3). Can anyone help me in understanding how should I handle this issue?

Note: I have asked same related question at the link. This question is much more concise as compared to previous one.

Community
  • 1
  • 1
Haroon Lone
  • 2,837
  • 5
  • 29
  • 65
  • 3
    This is still almost certainly too broad. There are lots of ways that a SPARQL engine can be implemented. The issue that you seem to be running into is that you're matching triples without any surrounding context. A context is essentially a set a variable bindings. When you want to check whether a triple matches a pattern, you should be working with three things: the actual triple in the data, the pattern, and the context. The pattern might be something like `?s ?p ?o`, and the context might be something like `{?s => , ?p => }`. That means that the effective... – Joshua Taylor Aug 18 '15 at 15:47
  • 1
    pattern is ` ?o`. Matching a triple against it means checking whether the subject and the property match, and then returning an extended binding `{?s => , ?p => , ?o => ...}`. It will probably be **very** helpful to take a look at the unification algorithm and perhaps some Prolog implementations. I've often found Norvig's [*Correcting a Widespread Error in Unification Algorithms*](http://norvig.com/unify-bug.pdf) a nice short exposition. – Joshua Taylor Aug 18 '15 at 15:50
  • @JoshuaTaylor, I am providing context information during pattern matching process. I also get the correct answer corresponding to each single, triple pattern of query. The problem arises when there are common variables present in different triple patterns of a query. I am not able to check whether the answers corresponding to different triples of a query satisfy the constraints on same/similar variables. – Haroon Lone Aug 19 '15 at 03:25

1 Answers1

2

Put the streaming issue to one side -- there are streaming SPARQL engines around that deal with that and also a W3C Community Group. A Google search will find them.

Consider the pattern: { ?x p1 ?y . ?x p2 ?z }.

This is a database join with the constraint.

Any join algorithm will work. Let's take an index join as a reasonably efficient algorithm that

Step 1: Find all ?x p1 ?y.

Step 2: for each match, take ?x and look for ?x p2 ?z for that value of ?x. This is a loop on the values of ?x from step 1, and there is a single pass so it is streaming on pattern one, and probing on pattern two.

The output is all things passing step 2.

There are many join algorithms from simple inner loop joins though to parallel hash joins and many ways to be more efficient. In the above, starting with the triple pattern that is expected to generates the least number of matches is better.

For your example, extend to 3 patterns, by taking the output of step 2 and applying to ?z p3 o3

If all the data is strictly streaming, see the published work on streaming SPARQL, or work on microbatches. A parallel hash join can stream on both sides though it needs signficant amount of working space.

AndyS
  • 16,345
  • 17
  • 21