0

I am trying to get all of the pages in given category from wikipedia, including ones in subcategories. No problem with that, but I also want certain fields from each page, like birth date.

From this topic I suppose I need to use https://wikidata.org/w/api.php and not for example https://pl.wikipedia.org/...

I assumed I should use generator, but my trouble is that with calling WikiData I get an error about bad ID, which I don't get for Wikipedia.

query.params = {
 "action": "query", // placeholder for test
 "generator": "categorymembers",
 "gcmpageid": 1810130, // sophists'category at pl.wikipedia
 "format": "json"
}

I've tried to use that id from WikiData prefixed with "Q", but then I got badinteger

Alternatively I could make requests to Wikipedia for ids and then to WikiData, but calling two times for the same thing and handling all that ids into request...

Please help

adamgorszy
  • 15
  • 2

1 Answers1

0

TL;DR Using generators from Polish Wikipedia in Wikidata API does not work but other solutions exist.

A few things to note about Wikidata and its API:

  • Wikidata doesn't know anything about the category hierarchies on Polish Wikipedia (or on any other Wikipedia language version)
  • There is no API to query pages in all subcategories. This is mainly because the catgory system of MediaWiki allows cycles in the category hierarchies and infinite levels of nested categories.
  • pageIds are only unique within a project. So using a pageId from pl.wikipedia.org does not work on https://en.wikipedia.org/w/api.php or https://www.wikidata.org/w/api.php

There are multiple solutions to your problem:

  1. Use the query in your question recursively to get all page titles from Kategoria:Sofiści and its subcategories. Afterwards, use the Wikidata API to retrieve the Wikidata item for each Polish Wikipedia article: e.g. for Protagoras the query is this: https://www.wikidata.org/w/api.php?action=wbgetentities&sites=plwiki&titles=Protagoras&props=claims&format=json This returns a json file with all statements about Protagoras stored on Wikidata. The birth data you find in that file under claims->P569->mainsnak->datavalue->value->time.

  2. Use the Wikidat Query Service. It allows you to call out MediaWiki API from SPARQL.

SELECT ?item ?itemLabel ?date_of_birth WHERE {
  SERVICE wikibase:mwapi {
     bd:serviceParam wikibase:api "Generator" .
     bd:serviceParam wikibase:endpoint "pl.wikipedia.org" .
     bd:serviceParam mwapi:gcmtitle 'Kategoria:Sofiści' .
     bd:serviceParam mwapi:generator "categorymembers" .
     bd:serviceParam mwapi:gcmprop "ids|title|type" .
     bd:serviceParam mwapi:gcmlimit "max" .
    ?item wikibase:apiOutputItem mwapi:item .
  }
  ?item wdt:P569 ?date_of_birth
  SERVICE wikibase:label { bd:serviceParam wikibase:language "pl". }
}

Insert this query on https://query.wikidata.org/. That page also offers you code examples how to access the results programmatically. The drawback of this solution is, that pages in subcategories are not included.

  1. Fully rely on Wikidata. Use the following query in https://query.wikidata.org/:
SELECT ?item ?itemLabel ?date_of_birth WHERE {
  ?item wdt:P106 wd:Q3750514.
  ?item wdt:P569 ?date_of_birth
  SERVICE wikibase:label { bd:serviceParam wikibase:language "pl,en". }
}
Pascalco
  • 2,481
  • 2
  • 15
  • 31
  • I really hoped I could use generator for WikiData, but I guess, like I thought, I am left with fetching wikipedia firstly and then WikiData, as You described (1); could go with 2|3 but SPARQL doesn't look that nice ;). Thanks for help! – adamgorszy Sep 18 '21 at 14:15