1

I am writing a user-app that takes input from the user as the current open wikipedia page. I have written a piece of code that takes this as input to my module and generates a list of keywords related to that particular article using webscraping and natural language processing.

I want to expand the functionality of the app by providing in addition to the keywords that i have identified, a set of related topics that may be of interest to the user. Is there any API that wikipedia provides that will do the trick. If there isn't, Can anybody Point me to what i should be looking into (incase i have to write code from scratch). Also i will appreciate any pointers in identifying any algorithm that will train the machine to identify topic maps. I am not seeking any paper but rather a practical implementation of something basic

so to summarize,

  1. I need a way to find topics related to current article in wikipedia (categories will also do)
  2. I will also appreciate a sample algorithm for training a machine to identify topics that usually are related and clustered.

ps. please be specific because i have researched through a number of obvious possibilities appreciate it thank you

Dr narendra thorat
  • 413
  • 1
  • 4
  • 5
  • If you want to get the categories of a certain article, then, yeah, those are available through [the API](http://www.mediawiki.org/wiki/API:Main_page). – svick Mar 18 '12 at 18:21
  • That i have incorporated but i want more of article names similar to current article. In categories also if i just get the related categories, that will also do. – Dr narendra thorat Mar 19 '12 at 04:12

2 Answers2

0

"See also" is a section often present in Wikipedia pages. It is structured like the example below, from [[Article (publishing)]]:

==See also==
* [[Article directory]]
* [[Electronic article]]

You should then parse the wikicode (you can take that via dumps or the Mediawiki API, as hinted in the previous answers), and use the articles mentioned.

Another way is to use directly the Wikipedia categories, there are APIs for that.

Aubrey
  • 507
  • 4
  • 20
0

You can scrape the categories if you want. If you're working with python, you can read the wikitext directly from their API, and use mwlib to parse the article and find the links.

A more interesting but harder to implement approach would be to create clusters of related terms, and given the list of terms extracted from an article, find the closest terms to them.

Not_a_Golfer
  • 47,012
  • 14
  • 126
  • 92