-1

How to get all Wikipedia article titles in one place without extra characters and pageids. Just the article's title. Something like this:

image

When I download wikipedia dump, I get this

Maybe I know a movement that might get me all pages but I wanted to get all pages in one take.

  • 2
    What have you done by now? I won't downvote your question, but many people here will if you don't ask it correctly. – Ed de Almeida Nov 27 '16 at 01:52
  • 1
    I suggest you to read this: http://stackoverflow.com/help/how-to-ask – Ed de Almeida Nov 27 '16 at 01:52
  • I've read that but I didn't see what I wanted. –  Nov 27 '16 at 02:54
  • When you asked "what have you done by now", what did you think? –  Nov 27 '16 at 02:57
  • Possible duplicate of [How to get all Wikipedia article titles with MediaWiki API?](http://stackoverflow.com/questions/29258782/how-to-get-all-wikipedia-article-titles-with-mediawiki-api) – Termininja Nov 27 '16 at 11:24
  • @Termininja I don't think it is because I just want the articles title without those extra characters showing pageid. Do you see? –  Nov 27 '16 at 12:25
  • You changed your question, but you still need to clarify what did you try to get all titles? For now, the answer below is enough, you just need to parse the response to get only the titles, and before to tell us what language do you use we can't help you more - you can parse them by program, also by hand. – Termininja Nov 27 '16 at 13:24
  • I went to https://en.wikipedia.org/wiki/Special:AllPages to get all titles. What would you want me to parse? –  Nov 27 '16 at 15:01
  • Does anybody know what to do? –  Nov 28 '16 at 09:50

1 Answers1

6

You'll find it on https://dumps.wikimedia.org

The latest List of page titles in main namespace for English Wikipedia as a database dump are in this directory.
There is a subdirectory for each recent day, with separate files for every Wiki project site in every language.
The English Wikipedia titles are in the files named enwiki-yyyymmdd-all-media-titles.gz. As of August 2023 this file is 9 MB compressed, 26M uncompressed.

If you rather want it through the API you use query and list=allpages but that only give you maximum 500 (5k for bots) at a time, so you will have to make more than 10 000 API calls for the English Wikipedia.

Example: https://en.wikipedia.org/w/api.php?action=query&format=xml&list=allpages&aplimit=max

greybeard
  • 2,249
  • 8
  • 30
  • 66
Ainali
  • 1,613
  • 13
  • 23
  • I just want the titles of the articles. Would you want me to show you what I want? –  Nov 27 '16 at 10:13
  • 2
    From the database dump you just need to SELECT the column with titles and you will have a plain list. – Ainali Nov 27 '16 at 10:27
  • Does anybody know what to do? –  Nov 28 '16 at 09:51
  • 1
    I found titiles here: http://ftp.acc.umu.se/mirror/wikimedia.org/dumps/enwiki/20180320/ Check: * enwiki-latest-all-titles.gz * enwiki-latest-all-titles-in-ns0.gz – Tomasz Kuter Apr 04 '18 at 15:51
  • 1
    There's a dead link in this answer, here is the database dump (as of dec 29, 2022): https://dumps.wikimedia.org/other/pagetitles/20221229/enwiki-20221229-all-titles-in-ns-0.gz – nonimportant Dec 29 '22 at 10:37