I am trying to use the Wikipedia API to get all links on all pages. Currently I'm using
but this does not seem to start at the first article and end at the last. How can I get this to generate all pages and all their links?
I am trying to use the Wikipedia API to get all links on all pages. Currently I'm using
but this does not seem to start at the first article and end at the last. How can I get this to generate all pages and all their links?
The English Wikipedia has approximately 1.05 billion internal links. Considering the list=alllinks
module has a limit of 500 links per request, it's not realistic to get all links from the API.
Instead, you can download Wikipedia's database dumps and use those. Specifically, you want the pagelinks
dump, containing information about the links themselves, and very likely also the page
dump, for mapping page ids to page titles.