17

I am trying to see the table of contents in a wikipedia page using Wikipedia API for python. Here is my code.

>>> import wikipedia
>>> ny = wikipedia.page("New York")
>>> ny.sections

But I am getting an empty list [] as the result. When I go to the page and check, I can see that there is content in the table of contents. All the other things said in the documentation seem to work except this. I am new to python coming from a java background.

mahacoder
  • 895
  • 3
  • 14
  • 29

3 Answers3

16

There is a bug in the current version of the Wikipedia API python library. You can install a branch by lucasdnd on github that fixed this:

pip install git+https://github.com/lucasdnd/Wikipedia.git

(You can --upgrade if you already have it installed)

Now:

>>> import wikipedia
>>> ny = wikipedia.page("New York")
>>> ny.sections
[u'History', u'16th century', u'17th century', u'18th century, the American Revolution, and statehood', u'19th century', u'Immigration', u'September 11, 2001 attacks', u'Hurricane Sandy, 2012', u'Geography', u'Climate', u'Statescape', u'Regions', u'Adjacent geographic entities', u'State parks', u'National parks', u'Administrative divisions', u'Demographics', u'Population', u'Most populous counties', u'Major cities', u'Metropolitan areas', u'Racial and ancestral makeup', u'Languages', u'Religion', u'LGBT', u'Economy', u'Wall Street', u'Silicon Alley', u'Microelectronic hardware and photographic processing', u'Media and entertainment', u'Tourism', u'Exports', u'Education', u'Transportation', u'Government and politics', u'Government', u'Capital punishment', u'Federal representation', u'Politics', u'Sports', u'See also', u'References', u'Further reading', u'External links'] 

It'll hopefully be fixed in the main library sometime soon.

slaporte
  • 703
  • 6
  • 14
11

I was facing the same issue. And since it's almost 3 years and it doesn't look, that it will get fixed, I have created another simple library - Wikipedia-API.

import wikipediaapi

wiki = wikipediaapi.Wikipedia('en')
mutcd = wiki.page('Comparison of MUTCD-Influenced Traffic Signs')
print("\n".join([s.title for s in mutcd.sections]))

Output:

Places
Media and entertainment
Sports
Ships
Other uses
See also
Martin Majlis
  • 363
  • 2
  • 10
0

The latest version has a similar bug

>>> wikipedia.summary('Creativity')
PageError: Page id "creatity" does not match any pages. Try another id!
>>> wikipedia.page('Creativity')
PageError: Page id "creatity" does not match any pages. Try another id!
>>> wikipedia.suggest('Creativity')
'creatity'
>>> wikipedia.search('Creativity')
['Creativity',
 'Creativity (religion)',
 'Creativity and mental health',
...
PageError: Page id "creatity" does not match any pages. Try another id!
>>> wikipedia.page('creativity')
PageError: Page id "creatity" does not match any pages. Try another id!

Lowercasing, etc. doesn't help, but adding the "(religion)" qualifier does, unless you're not looking for the religion page.

Digging into the source code and Wikipedia API, I found it was Wikipedia's suggest API that was returning the invalid page title suggestion. You may be able to turn off auto_suggest if you're sure your page title ("New York") exists:

>>> wikipedia.page('Creativity', auto_suggest=False)
<WikipediaPage 'Creativity'>
>>> wikipedia.page('New York', auto_suggest=False)
DisambiguationError: "New York" may refer to: 
New York City
New York (state)
...
>>> wikipedia.page('New York City', auto_suggest=False)
<WikipediaPage 'New York City'>

And there have been several pull requests that implement fixes over the past 6 months, but none have been reviewed yet: https://github.com/goldsmith/Wikipedia/pull/305

hobs
  • 18,473
  • 10
  • 83
  • 106