4

I'm trying to figure out if there's a way to determine whether a given article refers to a Person, Organization or Location. I imagine the answer lies somewhere in the "categories" and "clcategories" parameters... however, here's the issue.

Take Albert Einstein for example. The results for the query:

https://en.wikipedia.org/w/api.php?action=query&titles=Albert%20Einstein&prop=categories&clcategories=Category:People%20from%20Berlin

...show me that, indeed, Albert Einstein is a member of the category "People from Berlin".

Similarly, just by browsing through the Category tree on Wikipedia, I can show that "People from Berlin" is a subcategory of the category "People", via this path:

People > People_categories_by_parameter > People by place > People by city > People by country and city > People by city in Germany > People from Berlin

However, Albert Einstein isn't (directly) a member of the category "People", so this query:

https://en.wikipedia.org/w/api.php?action=query&titles=Albert%20Einstein&prop=categories&clcategories=Category:People

...gets me no results under Categories, i.e. it's not a match.

Is there some way to find out whether a page is a member of any Category X, where category X is a descendant of a specified Category Y?

Thanks!

Nemo
  • 2,441
  • 2
  • 29
  • 63
DanM
  • 7,037
  • 11
  • 51
  • 86
  • There's no API for that. You only can browse the category tree as you already do. Check out the http://toolserver.org/~dapete/catgraph/ tool maybe, if that helps you (the `dot` file format should be easily parseable) – Bergi Sep 16 '13 at 15:35
  • Oy. To make life more difficult, I've just read that Wikipedia's category "tree" is in fact not a tree at all, but a directed graph which contains lots of circuits... and doing a one-level-per-query tree traversal sounds less than optimal. A possible "good enough" solution would be to just check if a page is in a category beginning with a particular word or words, e.g. "People..." or "Companies..." ... is there some way to do that? It looks like Generators may be useful, but I've so far been unable to figure out how. – DanM Sep 16 '13 at 18:27

2 Answers2

2

I don't know of a Wikipedia-API way to do this, but I can think of a Freebase way. The following freebase query will get you the Freebase "types" associated with a given Wikipedia article. "People", "Politicians", "Artists", "Places", etc -- are all easily recognizable from those types.

{
  "key": [{
    "namespace": "/wikipedia/en",
    "value": "William_Ambrose"
  }],
  "type": []
}

(Replace en with the actual Wikipedia language, of course, and "William_Amrose" with the Wikipedia article name. See my note below on escaping, though!)

The result, in this case, is:

{
  "result": {
    "type": [
      "/common/topic",
      "/people/person",
      "/people/deceased_person",
      "/government/politician"
    ],
    "key": [{
      "namespace": "/wikipedia/en",
      "value": "William_Ambrose"
    }]
  }
}

... which clearly means that's a "Person" and a "Politician" (and also a "deceased person" at that, but that's another matter.)

See my answer to get wikipedia linked links for notes on how the API works, and a REST example. Especially, take a good look at the notes for getting API keys from Google and for Freebase-escaping the strings.

Good luck.

Community
  • 1
  • 1
Nitzan Shaked
  • 13,460
  • 5
  • 45
  • 54
  • 1
    This is looking like an excellent option. I'd never even heard of Freebase. Thanks! – DanM Sep 17 '13 at 15:31
2

Nowadays you should ask Wikidata, whose property P31 will tell you things like "is a human".

Nemo
  • 2,441
  • 2
  • 29
  • 63