2

I am trying to comb through Wikipedia articles and want the following information on each article:

Content in plain text, links, redirects (titles of pages that redirect to current page), and metadata (page views and edits on monthly basis)

The MediaWiki API is very comprehensive but is also quite dense since it is made for editing pages. I have tried both the Wikipedia and mwclient modules, but these do not have the metadata available. Is there another available tool that gives the read-only functionality of the API without the overhead of the API?

1 Answers1

3

There are a few alternative parsers but in general MediaWiki's parsing is hard to replicate (it has no formal specification and is implemented as a mess of regular expressions). You are almost always better off with the API or DB dumps. Specifically, links and redirects are available as tables in the DB dump. Text is available via the extracts API, and pageviews are a work in progress. I don't think aggregated edit data is available.

Tgr
  • 27,442
  • 12
  • 81
  • 118