3

I have a list of articles and I want to find the main category of each article.

Wikipedia lists its main categories here - http://en.wikipedia.org/wiki/Portal:Contents/Categories.

I am able to find the subcategories of each article using:

http://en.wikipedia.org/w/api.php?action=query&prop=categories&titles=%s&format=xml

I also am able to check whether a subcategory is within a category:

http://en.wikipedia.org/w/api.php?action=query&titles=Dog&prop=categories&clcategories=Domesticated animals&format=xml

This will tell me whether "domesticated animals" is a subcategory of Dog, but this is not quite what I want. I want to be able to check which main category 'domesticated animals' is in. Is this possible using the API?

  • 1
    The page you are referring to is manually curated and not technically "main categories". This is closer: https://en.wikipedia.org/wiki/Category:Main_topic_classifications but technically it is not the category on top either (it is placed in the category Content). – Ainali Aug 30 '14 at 06:51
  • 1
    However, if you want to find out what categories Domesticated animals is in, use: http://en.wikipedia.org/w/api.php?action=query&prop=categories&format=xml&cllimit=10&titles=Category%3ADomesticated%20animals As you see it belongs to three categories. You can repeat that API-call with each of them and "climb up" the category taxonomy. – Ainali Aug 30 '14 at 06:59
  • @Ainali Thank you so much! I'll try out that method. – user3746644 Aug 31 '14 at 13:10

1 Answers1

1

First, there is no such thing as a "Wikipedia API". There is a MediaWiki (web) API. Knowing this will help you find information on the existing tools. https://www.mediawiki.org/wiki/API:Main_Page

Which tells you there is no API which will do all the category recursion for you. Why? Because 1) it's extremely inefficient, 2) the recursion might go anywhere or never end.

However there is a solution now, by Magnus Manske: https://tools.wmflabs.org/catscan2/reverse_tree.php?doit=1&language=en&project=wikipedia&title=Dog&namespace=0 "Maximum depth: 61 levels Total categories along the way : 7988" Using that definition, the "root" category for [[Dog]], i.e. the farthest father category, is "Industry by country". Probably not what you expected! However, from the English Wikipedia's perspective the root category for any article is always the same, [[Category:Contents]].

Nemo
  • 2,441
  • 2
  • 29
  • 63