I've just set up a mediawiki server. I wanted to export data from wikipedia, but it doesn't allow for a pagelink_depth
higher than 0 by default. It seems that you can only change the maximum pagelink_depth
by setting up your own mediawiki and adjusting the $wgExportMaxLinkDepth
. Now I've done all that, but obviously my own mediawiki has no content. So I was wondering if there was a way to bulk copy all of wikipedia into my own server. From the information I've read this seems only doable with about a 100 pages at a time. If that's the case there'd be absolutely 0 purpose for the Special:Export in general, as you'd need to know exactly which pages you want to import prior to doing the export, which defeats the purpose altogether. Any help would be much appreciated.

- 6,961
- 3
- 42
- 51

- 103
- 7
2 Answers
Special:Export isn't meant for a complete export of a wiki, especially not using the web-interface and with so much pages in the database. Special:Export should be used, if you want to export a known page with all contents to import this page (or a small amount of pages) into another wiki, e.g. to export and import a template from one wiki into the other one. So, the Special:Export special page has a valid purpose, but you try to use it for another use case, for which it wasn't developed for ;)
If you want to export any page of a MediaWiki wiki, you should use the maintenance script (run-able through the command line) dumpBackup.php or any other backup script in the maintenance folder. This will ensure, that you get what you want.
For the case of Wikipedia, you can't access these scripts (I mentioned this for general purpose only), but the Wikimedia foundation provides database dumps of the Wikimedia-Wikis, including Wikipedia.

- 2,796
- 1
- 15
- 25
"So I was wondering if there was a way to bulk copy all of wikipedia into my own server" I would recommend against this simply on the sheer size of the data & the vast number of open links (or "redlinks" or "bad links") you would be adding if you didn't actually copy it all in. A better approach is to follow all the Wikipedia conventions about page NAMING, to the punctuation mark.. then write a script that checks say once a night whether you have linked to something that is already defined in Wikipedia, and then imports ONLY THAT PAGE and adds a link up top to the EXACT VERSION OF IT that was imported. That way you only bring in what you actually reference, but your database can integrate with Wikipedia's.
This will also come in immensely handy if you have to support multiple languages, like Spanish or French, as well, since Wikipedia has links to 'the same article in another language' thus translating at least those concepts for you.

- 9
- 1