Grouping by blank nodes

Question

I have the following data:

@prefix f: <http://example.org#> .

_:a f:trait "Rude"@en .
_:a f:name "John" .
_:a f:surname "Roy" .
_:b f:trait "Crude"@en .
_:b f:name "Mary" .
_:b f:surname "Lestern" .

However, if I execute the following query in Blazegraph:

PREFIX f: <http://example.org#>

SELECT ?s ?o
WHERE
{
    ?s f:trait ?o .
}

I get six results:

s   o
t32 Crude
t37 Crude
t39 Crude
t31 Rude
t36 Rude
t38 Rude

If blank nodes _:a and _:b are distinct nodes, how should I write a SPARQL query to return only two distinct results? I have tried SELECT DISTINCT, but it still returns six results. I have tried grouping by ?o, but Blazegraph returns an error, saying it's a bad aggregate. Why does this kind of output of repeating tuples happen? And how to avoid it?

What do you _exactly_ mean by "I have the following data"? I suppose your problem is similar to [this](https://sourceforge.net/p/bigdata/discussion/676946/thread/e6d077d0/#d6e3). — Stanislav Kralin, Jun 10 '17 at 21:10
@StanislavKralin I mean that is the data that I have loaded into blazegraph using update tab in the application. It's just a small practice dataset because I'm learning SPARQL. So that might be a bug if I get your link right. — Gitnik, Jun 10 '17 at 22:18
If you really get 6 results for that query on your sample data in a single graph then something is wrong in Blazegraph. — UninformedUser, Jun 11 '17 at 01:40
Liliane, how many times have you pressed the "Update" button? I guess, exactly 3 times. Blank node labels are not URIs, they are "persistent" in a current transaction only. `_:a` in your first update is not the same as `_:a` in your second update. — Stanislav Kralin, Jun 11 '17 at 08:13
@StanislavKralin It seem that you are right. After restarting my PC, and loading data again (pressing update only once), the query return exactly two results. After clicking update once again, the query returns four results, and so on. Can you explain what did you mean by ``a current transaction``? Why second update does not override first update? It seems that data persists in memory. Could that be due to Java GC? Also, if you can write your answer in the answer section so I can accept it as solution :) — Gitnik, Jun 11 '17 at 10:09
Each update generates new blank nodes - this are anonymous nodes, how should a transaction know that ones from the action before? — UninformedUser, Jun 11 '17 at 16:26
I just want to add that the problem "resolves" when I change a namespace and load data into a new namespace. But if I use the old namespace where I have pressed "Update" 3 times, the said query returns six rows. — Gitnik, Jun 11 '17 at 17:34

score 4 · Accepted Answer · edited Jun 20 '20 at 09:12

The problem is that you have inserted data containing blank nodes several times. This operation is not idempotent.

Useful quotes

From RDF 1.1 Concepts and Abstract Syntax:

Blank node identifiers are local identifiers that are used in some concrete RDF syntaxes or RDF store implementations. They are always locally scoped to the file or RDF store, and are not persistent or portable identifiers for blank nodes.

From RDF 1.1 Semantics:

RDF graphs can be viewed as conjunctions of simple atomic sentences in first-order logic, where blank nodes are free variables which are understood to be existential. Taking the union of two graphs is then analogous to syntactic conjunction in this syntax. RDF syntax has no explicit variable-binding quantifiers, so the truth conditions for any RDF graph treat the free variables in that graph as existentially quantified in that graph. Taking the union of graphs which share a blank node changes the implied quantifier scopes.

From SPARQL 1.1 Query Language:

Blank node labels are scoped to a result set.

There need not be any relation between a label _:a in the result set and a blank node in the data graph with the same label.

An application writer should not expect blank node labels in a query to refer to a particular blank node in the data.

From SPARQL 1.1 Update:

Blank nodes... are assumed to be disjoint from the blank nodes in the Graph Store, i.e., will be inserted with "fresh" blank nodes.

Some discussion

Different triplestores provides solutions for the "problems" described. E.g., Jena allows to use pseudo-URIs like <_:b1> etc.

Grouping by blank nodes

1 Answers1

Linked