3

I have a question a about RDF and duplicate triples. From perusing the internet it seems as if duplicate triples are somehow "bad" or a violation of some rule.

But duplicate triples seem to me, on the surface, meaningful.

Suppose I want to represent the fact: Susy(subject) mentions(predicate) Bob(object).

Suppose that I further wanted to represent that Susy mentions Bob on five times. Wouldn't have 5 triples of Susy mentions Bob allow me to represents this?

A later query that wants to know how many times Susy mentioned Bob could just ask for the COUNT of this repeated triple.

So my question is: is there anything wrong with this representation of the fact that Susy mentions Bob five time. And if so, what would be the preferred way of representing that the fact that Susy mentions Bob five times.

Jeff
  • 3,943
  • 8
  • 45
  • 68

1 Answers1

12

In theory RDF graph is a set of triples, which means that each triple can occur just once. Of course you could have a document, say in Turtle, which contains duplicates of a triple or quads but after loading to memory/store those triples should be treated as one. Any document is just text after all.

That said I've seen different behaviour depending on triple stores. For example AllegroGraph by default loads and handles duplicate triples. There is a manual options to trim the duplicates.

And no, querying will not tell you that you have a duplicate question, because SPARQL aggregations work with nodes and not whole triples.


Regarding your example, there are multiple ways.

TL/DR you will need a way to add statements about statements. See this slideshare for various ways, some of which I briefly described below.

Complete answer

The easiest is to introduce some kind of artificial intemediary graph node, which could be called Mention or whatever. For example

:Susan :mentions [
  rdf:type :Mention ;
  :mentionsWhom :Bob ;
  :times 5 
]

The problem is that this breaks existing semantics shall you happen to introduce such structure to existing data.


A simple and widely supported way is to use named graphs so that you have quads instead of triples. Below example enhances turtle syntax so that it becomes TriG. Note that the names graph is just another resource. Named graphs are also easy to query with any SPARQL processor.

# :susanMentionsBob is the named graph
:susanMentionsBob {
   :Susan :mentions :Bob
}

# we can say more about that graph
:susanMentionsBob :times 5

Another traditional solution is to use a form of reification. With reification you create a rdf:Statement object, where you can add additional data. The downside is that you need to repeat the original triple s/p/o

:Susan :mentions :Bob . # actual triple intact
_:reifiedStatement
   rdf:type rdf:Statement ;
   rdf:subject :Susan ;
   rdf:predicate :mentions ;
   rdf:object :Bob ;
   :times 5 . # extra statement about the mention

Lately more concise ways to reification have been introduced. You can use Singleton Property instead. You introduce an extra predicate, which replaces :mentions for a single usage and you add additional statement to that property:

:Susan :mentions#1 :Bob .
:mentions#1 rdf:singletonPropertyOf :mentions .
:mentions#1 :times 5 .

Note that you can use any name for the :mentions#1 property ot avoid collisions. Please have a look about the sildeshare linked above for more examples and SPARQL usage


Last but not least a non-standard way, supported only by BigData AFAIK, is Reification Done Right, or RDR. With RDR you can write

<<:Susan :mentions :Bob>> :times 5

By adding double angle brackets you can add statements aboout statements. This also works in BigData's SPARQL processor.

Brian Vosburgh
  • 3,146
  • 1
  • 18
  • 18
Tomasz Pluskiewicz
  • 3,622
  • 1
  • 19
  • 42