3

I'm interested in retrieving properties of a WikiData item, but only if the property was added or modified either before or after some date.

So I have this SPARQL query that gets all properties for Q24.

SELECT ?itemLabel ?propLabel ?statement_property_objLabel
WHERE {

    VALUES (?item) {(wd:Q24)}

    ?item ?property [?statement_property ?statement_property_obj] .
    ?prop wikibase:claim ?property.
    ?prop wikibase:statementProperty ?statement_property.

    # Call label service.
    SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }

} ORDER BY ?propLabel

Now, I'd like to keep only those properties that were modified either before < or after > an arbitrary date (e.g. 1/1/2017). I know there is a "last update" property P5017, but I don't know how I would use it to compare against an arbitrary date.

stackoverflowuser2010
  • 38,621
  • 48
  • 169
  • 217
  • 1
    I'm pretty sure that the Wikidata SPARQL endpoint does only contain the latest data and also not edit date per statement. There was also a research paper published last year: https://thomas.pellissier-tanon.fr/papers/2019-ESWC-wikidatahistory.pdf - The property P5017 you're referring to is used somehow in Wikidata (run `select * {?s wdt:P5017 ?o} limit 10`), but I don't think this is what you want given that those values are attached directly to the entities and not statements and even the dates are pre Wikidata. Clearly I might be wrong, others know better. – UninformedUser May 13 '20 at 14:12
  • 1
    What @UninformedUser outlines is correct _for the RDF_, not for _the SPARQL endpoint_, which can leverage several services, including the MediaWiki API, which can access revision information, as per https://www.mediawiki.org/wiki/Wikidata_Query_Service/User_Manual/MWAPI . If OP is only interested in Q24 or known groups of items, then they might be better off using the MediaWiki API directly or using SQL, e.g. via https://quarry.wmflabs.org/ . – Daniel Mietchen May 13 '20 at 22:42
  • @DanielMietchen I'm aware of the opportunity to use MWAPI via SPARQL - but, I'm still wondering then how you'd get all the properties that have been modified before or after a given date `t1`. In both cases you'd have to get the union of all properties in changesets before or after `t1` - I'm happy to learn new things, especially about Wikidata, so if you can show me the SPARQL solution I'd be very happy – UninformedUser May 14 '20 at 07:33
  • Here is a MediaWiki API call that gets all revisions and their respective content, which could then be filtered by the presence or absence of a specific property: https://www.wikidata.org/w/api.php?action=query&format=json&prop=revisions&titles=Q24&rvprop=timestamp|user|comment|content&rvlimit=500 . Not sure to what extent this is available via the SPARQL endpoint though. – Daniel Mietchen May 22 '20 at 17:43

1 Answers1

2

You probably can't do this with SPARQL, sadly. The only things that SPARQL knows about are:

  • a) the last date the item was edited at all (which gives you an effective "no later than" date for any claim in it) using schema:dateModified;
  • b) any specific dates embedded in claims that state (or hint at) when they were updated.

For b) you could in theory use P813 (date information was retrieved). P5017 is for the date of revision of the ''source'', not the statement, and can be long in the past.

However, this approach relies on those statements being present. Most references do not use these - Q24 only has one reference that uses P813. It's also not guaranteed that the claim has not been edited since then - you would assume probably not, but there's no way to be sure. They are not automatically applied or updated.

References might also have P577 (publication date) which could be used to infer an update figure - if publication date is 2020-02-01, the claim was probably edited since the start of February, since it would be unlikely someone would cite a reference with a future publication date. But this is a bit tenuous and not amazingly useful unless it happens to match closely to your test date.

In practice, I think you would need to parse the page history to be able to say anything for sure about when a given claim was last edited. Almost all edit summaries for claim edits are quite standardised so this should hopefully be practical to do without investigating each individual revision, but it might also be a lot of work...

Andrew is gone
  • 286
  • 1
  • 5