4

Many datasets have a history of changes. Making historical data available as Linked Data can be a challenge. The general case I am considering is one where a dataset has data about things that have properties that can change in time. An example could be the history of Windsor Castle: it has had many configurations over the past, but it can still be considered the same thing. One way to handle that could be to have temporal annotation for properties. But then one gets into the awkward territory of having metadata per RDF triple. I think a simpler solution would be to think in terms of versions of things: when one or more properties of a resource change, a new version comes into existence.

Below is a simple example of someone who changes his name at a certain date:

@prefix : <http://www.example.com/mydataset/> .
@base <http://www.example.com/mydataset/> .
@prefix dcterms: <http://purl.org/dc/terms/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .

:p1 a foaf:Person ;
  foaf:name "Bob" ;
  dcterms:valid "start=2015-06-20;" ;
  dcterms:replaces <p1/version1> .

<p1/version1> a foaf:Person ;
  foaf:name "Alfred" ;
  dcterms:valid "start=1975-08-01; end=2015-06-19;" ;
  dcterms:isVersionOf :p1 ;
  dcterms:isReplacedBy :p1 .

In this example, the main URI (:p1) always points at the most recent version. That is useful, because historical data may not always be needed. The current data do have a link to the previous version. The attributes dcterms:replaces and dcterms:isReplacedBy can form a chain of older versions.

I like this setup because it is straightforward and does not rely on something like SPARQL to work. However, a problem is the specification of temporal validity. The only appropriate term I could find is dcterms:valid. But its range is a literal. That works with the DCMI Period Encoding Scheme, but I think it would be much more useful to be able to use common data types for time like xsd:dateTime or xsd:gYear. That would help querying (by time range or by point in time) and ordering the data a lot. For example, temporal querying in SPARQL is dependent on datatype xsd:dateTime.

So my question is: Can someone suggest a simple versioning scheme for Linked Data that can use common data types for time? Or maybe just an alternative for dcterms:valid?

UPDATE: A suggestion was to look at PROV, which provides semantics for provenance, for alternatives. PROV does include the concept of validity, and an attempt has been made to map dct:valid to PROV. My reputation is too low to post additional hyperlinks, so I quote:

dct:valid: "Date (often a range) of validity of a resource." This property could correspond to PROV's generation and invalidation of the resource or one of its specializations. However, dct:valid can be used to set expiry dates (e.g., resource valid until 2015), which is not provenance (it is not about past events). Thus this property is left out of the mapping.

For historical data, which this question is about, the fact that dct:valid can set future dates does not matter. So PROV's generation and invalidation could still be applicable. The relevant PROV terms seem to be prov:generatedAtTime and prov:invalidatedAtTime. They could be used to express the temporal validity of a version. However, the range of those properties is xsd:dateTime, which means each time needs to be known up to the level of seconds. Especially for historical data from before the digital age, that is not always known. Sometimes all is known is a year or a date. So it seems PROV is too restrictive in another way.

  • 1
    This is not a good Q for this site, because there is no simple answer. RDF has no inherent concept of versioning. Anything described in RDF "is." All solutions for temporality are kludges to some degree. You might look at [PROV](https://www.w3.org/TR/prov-primer/) which covers much in this area. – TallTed Jan 24 '17 at 15:57
  • @TallTed is correct; PROV does enable attribution and intervals. – Jay Gray Jan 25 '17 at 10:14
  • @TallTed: do you think the question does not belong here or is it just the title that is misleading? I did not mean versioning in RDF, but versioning in RDF based data, or Linked Data that is based on RDF. About PROV: it is a good lead, but it seems it does not provide a full solution. I will update the original question to include PROV. – F.J. Knibbe Jan 25 '17 at 13:25
  • I think you have an interesting problem and PROV is a very interesting approach, but I don't think it solves your problem. As for whether this is a good question for SO, it's a hard to say, but I'd vote to keep it here. – Sentry Jan 25 '17 at 13:56
  • Hm. As a participant in the PROV Working Group, I know the intent was to permit vague notations like the year a work of art was created, where we might know no other timing details. A quick look suggests that might not have been communicated properly, or even be true, in the spec as completed. That said, RDF ontologies are not enforced like SQL schema definitions. the ontology saying the range is `xsd:dateTime` does not prevent you from treating this range as including `xsd:date` and/or `xsd:gYear`... – TallTed Jan 26 '17 at 00:06
  • @TallTed: Thank you for clarifying and providing a way out. But wouldn't it be funny to ignore the range specification of PROV properties? They must have a reason for being there in the first place. – F.J. Knibbe Jan 27 '17 at 14:28
  • Yes, and the reason in this case was to say "this property should hold a datetime" but not to say "this property's value must be precise to the second"... – TallTed Jan 27 '17 at 14:35
  • For more advanced cases of versioning the resources, allowing variants of those resources as well as defining sets of valid resources at certain versions, look at the OSLC Configuration Management specification that is targeted at the industrial ALM/PLM configuration needs: https://tools.oasis-open.org/version-control/browse/wsvn/oslc-ccm/trunk/specs/config-mgt/oslc-config-mgt.html – berezovskyi May 01 '17 at 15:43

1 Answers1

1

A vocabulary to support these kind of changes is for example ChangeSet

http://vocab.org/changeset/

If you model it with this you have on the one hand your data and on the other hand metadata about the changes.

ChristophE
  • 760
  • 1
  • 9
  • 21
  • I found something similar, but I don't know how it compares to your vocabulary: http://topbraid.org/change – Sentry Jan 25 '17 at 13:47
  • I would just compare the vocabularys and see which one fits best to the thing you want to accomplish – ChristophE Jan 25 '17 at 13:50
  • Thanks, I looked at the changeset vocabulary before. The examples aren't available at the moment, but https://www.w3.org/2009/12/rdf-ws/papers/ws07 provides an example. I got the impression the changeset vocabulary is meant more for tracking editing changes in a dataset than expressing how a resource can have different historical versions. I think those are different use cases. I will update the original question to clarify that, using the history of Windsor Castle as an example. – F.J. Knibbe Jan 25 '17 at 14:12
  • The topbraid change ontology seems to serve a similar purpose as the changeset vocabulary: recording transactions in a dataset. I think I am looking for something else: recording the time intervals of validity of resource descriptions. – F.J. Knibbe Jan 25 '17 at 15:09