How to extract multiple Wikipedia articles at once?

Asked May 01 '16 at 16:04

Active Jan 20 '17 at 10:00

Viewed 137 times

I am using Python 2.7 Anaconda.

I have used the Wikipedia Python package to extract a list of article titles:

titles = wikipedia.random(pages=1000).decode('utf-8')
titles_encoded = [x.encode('utf-8') for x in titles]

Is there a way of using

wikipedia.summary(title=titles_encoded,auto_suggest=True, redirect=True).encode('utf-8')

in order to extract multiple articles at once? I have used a for loop but it takes really long:

for n in range(1,500):
    test[n] = wikipedia.summary(title=titles_encoded[n],auto_suggest=True, redirect=True).encode('utf-8')
    print(n,"text extracted")

I am looking for a solution that is more efficient/faster.

edited Jan 20 '17 at 10:00

Termininja

6,620
12
48
49

asked May 01 '16 at 16:04

ishido

4,065
9
32
42

I don't think there is an option to fetch summaries in bulk. If you need bulk data, the [free data dumps](https://meta.wikimedia.org/wiki/Data_dump_torrents#enwiki) might be a better way to get content. – ChrisP May 01 '16 at 18:43
I have looked at this, but I have to admit, I don't seem capable of parsing those xml formats. Which is why I'm looking for raw text – ishido May 02 '16 at 10:21

How to extract multiple Wikipedia articles at once?

0 Answers0