How to get associated (English) Wikipedia page from Wikidata page / Q number using Wikidata dump?

Question

For @en text alone, a single item from the Wikidata dump contains multiple names:

<http://www.wikidata.org/entity/Q26> <http://www.w3.org/2000/01/rdf-schema#label> "Northern Ireland"@en .
<http://www.wikidata.org/entity/Q26> <http://www.w3.org/2004/02/skos/core#prefLabel> "Northern Ireland"@en .
<http://www.wikidata.org/entity/Q26> <http://schema.org/name> "Northern Ireland"@en .

On the Wikidata page for this article (http://www.wikidata.org/entity/Q26), which of these (if any) corresponds to the canonicalized name used on the associated (English) the Wikipedia page?

No one of those. https://opendata.stackexchange.com/questions/6050/get-wikipedia-urls-sitelinks-in-wikidata-sparql-query . There also exists (outdated) partial dump of sitelinks: http://tools.wmflabs.org/wikidata-exports/rdf/exports/20160801/dump_download.html — Stanislav Kralin, Jan 20 '18 at 10:48

score 1 · Answer 1 · answered Jan 22 '18 at 21:55

Grab the triple in which the predicate is schema:partOf and the object is the wikipedia you want (for example, https://en.wikipedia.org/).

Here's an example using Python's rdflib:

>>> import rdflib
>>> g = rdflib.Graph()
>>> r = g.parse("https://www.wikidata.org/entity/Q26.nt")
>>> for s, p, o in g:
...     if p == rdflib.URIRef('http://schema.org/isPartOf') and o == rdflib.URIRef('https://en.wikipedia.org/'):
...             print(s)
... 
https://en.wikipedia.org/wiki/Northern_Ireland

You can adjust this approach according to whatever parser you're using, of course.

How to get associated (English) Wikipedia page from Wikidata page / Q number using Wikidata dump?

1 Answers1

Linked