4

I'm following this guide on querying from Wikidata.

I can get a certain entity (if I know its code) using with:

from wikidata.client import Client
client = Client()
entity = client.get('Q20145', load=True)
entity
>>><wikidata.entity.Entity Q20145 'IU'>
entity.description
>>>m'South Korean singer-songwriter, record producer, and actress'

But how can I get the RDF triples of that entity? That is, all the outgoing and incoming edges in the form of (subject, predicate, object)

Looks like this SO question managed to get the triples, but only from a data dump here. I'm trying to get it from the library itself.

Penguin
  • 1,923
  • 3
  • 21
  • 51
  • SPARQL `DESCRIBE` query is not sufficient? Like `DESCRIBE wd:Q20145` sent to the Wikidata SPARQL endpoint? Or a SPARQL `CONSTRUCT {?s ?p ?o} WHERE {{BIND(wd:Q20145 as ?s) ?s ?p ?o} UNION {BIND(wd:Q20145 as ?o) ?s ?p ?o}}` query if the Blazegraph `DESCRIBE` doesn't fetch the incoming edges. – UninformedUser Sep 13 '21 at 05:41
  • Note, none of those approaches will get you the statements about statements. That can also be done via `CONSTRUCT` though, I just kept it simple here as it's not clear to which extend you want the triples – UninformedUser Sep 13 '21 at 05:41
  • @UninformedUser I'm not familiar with SPARQL as I code in Python. If there's a way to get the triples as I need using some python script that wraps the SQL I might accept it also. The only issue I see which SQL (and again, I'm not too familiar with it), is that I have a lot of entities to query, and I'm not sure if there's a limit with SQL. In regards to "statements about statements", I'm not sure what you mean. Say I need entity `Q20145`. The triples I need are `[(Q20145, predicate_1, object_1), (Q20145, predicate_2, object_2)]`, and also `[(subject_1, predicate_1, Q20145),...]` – Penguin Sep 13 '21 at 12:58
  • 1) use `rdflib` in Python for running SPARQL queries. 2) I'm talking about statement qualifiers which are used often in Wikidata, e.g. having the data starttime and endtime about some statement. – UninformedUser Sep 14 '21 at 07:21

2 Answers2

5

If you only needed the outgoing edges, you could retrieve them directly by calling https://www.wikidata.org/wiki/Special:EntityData/Q20145.nt

from rdflib import Graph
g = Graph()
g.parse('https://www.wikidata.org/wiki/Special:EntityData/Q20145.nt', format="nt")    
for subj, pred, obj in g:
    print(subj, pred, obj)

To get the incoming and outgoing edges, you need to query the database. On Wikidata, this is done using the Wikidata Query Service and the query langauge SPARQL. The SPARQL expression to get all edges is as simple as DESCRIBE wd:Q20145.

With Python, you can retrieve the results of the query with the following code:

import requests
import json

endpoint_url = "https://query.wikidata.org/sparql"
headers = { 'User-Agent': 'MyBot' }
payload = {
    'query': 'DESCRIBE wd:Q20145',
    'format': 'json'
}
r = requests.get(endpoint_url, params=payload, headers=headers)
results = r.json()

triples = []
for result in results["results"]["bindings"]:   
    triples.append((result["subject"], result["predicate"], result["object"]))
print(triples)

This gives you the full result origin from the complex underlying data model. If you want to query the incoming and outgoing edges separately, write instead of DESCRIBE wd:Q20145 either CONSTRUCT {?s ?p ?o} WHERE {BIND(wd:Q20145 AS ?s) ?s ?p ?o} to only have the outgoing edges or CONSTRUCT {?s ?p ?o} WHERE {BIND(wd:Q20145 AS ?o) ?s ?p ?o} to only have the incoming edges.

Depending on your goal, you may want to filter out some triples, e.g. statement triples, and you may want to simplify some triples. A possibility to get a clearer result is to replace the last four lines by:

triples = []
for result in results["results"]["bindings"]:   
    subject = result["subject"]["value"].replace('http://www.wikidata.org/entity/', '')
    object = result["object"]["value"].replace('http://www.wikidata.org/entity/', '')
    predicate = result["predicate"]["value"].replace('http://www.wikidata.org/prop/direct/', '')
    if 'statement/' in subject or 'statement/' in object:
        continue
    triples.append((subject, predicate, object))
print(triples)
Pascalco
  • 2,481
  • 2
  • 15
  • 31
  • Thanks! A few questions. 1. Can you explain how to only get the outgoing edges method using python? The link just downloads a file. 2. Is there a limit to the number of SQL queries I can do? 3. This seems to be both the outgoing and incoming edges, is there a way to split them into 2 queries? – Penguin Sep 13 '21 at 19:52
  • 1
    1. It is just a line based string file, you can load it via Python or not? If you need a better native API in Python, use `rdflib` 2. indeed the public endpoint has to be used gracefully, it's a shared medium. Too many request in a short time will be blocked. You can always download the full dump and load it into your own triple store. 3. as I said in my comment, use SPARQL `CONSTRUCT` query, in that case two queries:: `CONSTRUCT {?s ?p ?o} WHERE {BIND(wd:Q20145 AS ?s) ?s ?p ?o}` and `CONSTRUCT {?s ?p ?o} WHERE {BIND(wd:Q20145 AS ?o) ?s ?p ?o}` – UninformedUser Sep 14 '21 at 07:19
  • I edited my answer and incorporated UninformedUser's comment. – Pascalco Sep 14 '21 at 16:36
1

But how can I get the RDF triples of that entity?

By using SPARQL DESCRIBE query (source), you get a single result RDF graph containing all the outgoing and incoming edges in the form of (subject, predicate, object). This can be achieved using the following Python example code (source):

from SPARQLWrapper import SPARQLWrapper

queryString = """DESCRIBE wd:Q20145"""
sparql = SPARQLWrapper("https://query.wikidata.org/sparql")

sparql.setQuery(queryString)
sparql.setReturnFormat(JSON)
results = sparql.query().convert()

for result in results["results"]["bindings"]:
    print(result)

If you want to get only the outgoing edges, use CONSTRUCT {?s ?p ?o} WHERE {BIND(wd:Q20145 AS ?s) ?s ?p ?o} and for the incoming edges, use CONSTRUCT {?s ?p ?o} WHERE {BIND(wd:Q20145 AS ?o) ?s ?p ?o} (thanks to @ UninformedUser).

Example code:

from SPARQLWrapper import SPARQLWrapper

queryString = """CONSTRUCT {?s ?p ?o} WHERE {BIND(wd:Q20145 AS ?s) ?s ?p ?o}"""
sparql = SPARQLWrapper("https://query.wikidata.org/sparql")

sparql.setQuery(queryString)
sparql.setReturnFormat(JSON)
results = sparql.query().convert()

for result in results["results"]["bindings"]:
    print(result)

The result with DESCRIBE and CONSTRUCT can be seen here and here respectively.

R. Marolahy
  • 1,325
  • 7
  • 17