1

For @en text alone, a single item from the Wikidata dump contains multiple names:

<http://www.wikidata.org/entity/Q26> <http://www.w3.org/2000/01/rdf-schema#label> "Northern Ireland"@en .
<http://www.wikidata.org/entity/Q26> <http://www.w3.org/2004/02/skos/core#prefLabel> "Northern Ireland"@en .
<http://www.wikidata.org/entity/Q26> <http://schema.org/name> "Northern Ireland"@en .

On the Wikidata page for this article (http://www.wikidata.org/entity/Q26), which of these (if any) corresponds to the canonicalized name used on the associated (English) the Wikipedia page?

EmJ
  • 4,398
  • 9
  • 44
  • 105
zadrozny
  • 1,631
  • 3
  • 22
  • 27
  • 1
    No one of those. https://opendata.stackexchange.com/questions/6050/get-wikipedia-urls-sitelinks-in-wikidata-sparql-query . There also exists (outdated) partial dump of sitelinks: http://tools.wmflabs.org/wikidata-exports/rdf/exports/20160801/dump_download.html – Stanislav Kralin Jan 20 '18 at 10:48

1 Answers1

1

Grab the triple in which the predicate is schema:partOf and the object is the wikipedia you want (for example, https://en.wikipedia.org/).

Here's an example using Python's rdflib:

>>> import rdflib
>>> g = rdflib.Graph()
>>> r = g.parse("https://www.wikidata.org/entity/Q26.nt")
>>> for s, p, o in g:
...     if p == rdflib.URIRef('http://schema.org/isPartOf') and o == rdflib.URIRef('https://en.wikipedia.org/'):
...             print(s)
... 
https://en.wikipedia.org/wiki/Northern_Ireland

You can adjust this approach according to whatever parser you're using, of course.

Dan Scott
  • 76
  • 3