0

I am using Python 2.7 Anaconda.

I have used the Wikipedia Python package to extract a list of article titles:

titles = wikipedia.random(pages=1000).decode('utf-8')
titles_encoded = [x.encode('utf-8') for x in titles]

Is there a way of using

wikipedia.summary(title=titles_encoded,auto_suggest=True, redirect=True).encode('utf-8')

in order to extract multiple articles at once? I have used a for loop but it takes really long:

for n in range(1,500):
    test[n] = wikipedia.summary(title=titles_encoded[n],auto_suggest=True, redirect=True).encode('utf-8')
    print(n,"text extracted")

I am looking for a solution that is more efficient/faster.

Termininja
  • 6,620
  • 12
  • 48
  • 49
ishido
  • 4,065
  • 9
  • 32
  • 42
  • I don't think there is an option to fetch summaries in bulk. If you need bulk data, the [free data dumps](https://meta.wikimedia.org/wiki/Data_dump_torrents#enwiki) might be a better way to get content. – ChrisP May 01 '16 at 18:43
  • I have looked at this, but I have to admit, I don't seem capable of parsing those xml formats. Which is why I'm looking for raw text – ishido May 02 '16 at 10:21

0 Answers0