0

I've been searching for a while about how to retrieve something like a "main category" from each Wikipedia article. I'm using the wikipedia API to retrieve the data but I'm getting multiple objects of categories within an array instead of one concise category.

I've seen people implement this, for example facebook in this page shows "Harry Potter and the Deathly Hallows: Part II" and if you see above this title there is a category that says "MOVIE" and it applies for everything, it could be "BOOKS", "MUSIC", "ARTISTS", "ANIMALS" which is what I'd like to get when using the API, I want this because I wanna make searches by using this specific category (I know that facebook is probably consuming the Wikipedia's API because the page says "FROM WIKIPEDIA, THE FREE ENCYCLOPEDIA" and it's like this everytime you find something which is like a copy and paste of the original wikipedia article.

Here an image if you don't wanna go to the link:

enter image description here

I've been reading for quite a while the Docs that the Wikipedia/Mediawiki API offers but haven't found anything that can help me so far, also I've read this question but the answer is not really helpful in my case and it's from two years ago.

Here is an example of how I'm consuming the API, for example here I made a search for "Harry Potter" and limit the request to get 3 results: https://es.wikipedia.org/w/api.php?format=jsonfm&action=query&generator=search&gsrnamespace=0&gsrsearch=Harry%20Potter&gsrlimit=3&prop=pageimages|categories&pilimit=max&utf8=1&exlimit=max

Any help or recommendation about how to fulfill this approach is appreciated.

Community
  • 1
  • 1
Enmanuel Duran
  • 4,988
  • 3
  • 17
  • 29

1 Answers1

1

Wikipedia has no concept of one category being more main than the others, and the ordering does not help either (it reflects the order in the source, which typically means automatically generated categories first, important categories at the end). Your best bet is probably to use the Wikidata API and fetch the value of the "instance of" attribute. E.g. HPatDHp2 is an instance of movie.

Tgr
  • 27,442
  • 12
  • 81
  • 118