I would like to get all the articles of historical events on Wikipedia? What is the best way to go about this? Wiki API? Database dumps? I've checked out the DBPedia and it seems to be quite limited so far (still impressive though).
Asked
Active
Viewed 1,309 times
1 Answers
1
For a list of all articles on historical events, have a look at the Events category and especially its subcategory Events by time. To get all articles in a category and all its subcategories, you could use the API or the SQL dumps (you would need at least the tables categorylinks
and page
; if you're using .Net, my library could help you with that).
To actually get the text of many articles, you should definitely use the XML dumps, probably pages-articles
.

svick
- 236,525
- 50
- 385
- 514
-
Any recommended tools for extracting data from the XML dump? – user1530580 Feb 06 '13 at 18:29
-
Also, I am curious as to why go through the Events category, rather than the history one? – user1530580 Feb 06 '13 at 18:31
-
@user1530580 I haven't used those much, and I think there are reasonable XML libraries in pretty much any language, pick one. Just make sure you're not trying to load the whole XML into memory at once. And you asked for events, so I looked for category that contains events. But only you know what exactly do you need. – svick Feb 06 '13 at 20:53
-
Ok thanks. I was just wondering if there was some specific tool recommending for going through wiki's xml dump, but guess not. – user1530580 Feb 07 '13 at 01:41