Get plain text from Wikipedia API by sections

Question

I'm trying to get plain(without html/css/special characters/ characters like \n/links/images) text of section using wikipedia api. I trying to do that with this code

import requests

API_URL = 'http://en.wikipedia.org/w/api.php'

def get_section(page, section):
    search_params = {
        'action': 'parse',
        'prop': 'text',
        'pageid': page,
        'section': section,
        'format': 'json'
    }

    response = requests.get(API_URL, params=search_params)

    return response.json()

text = get_section(23862, 2)
print(text['parse']['text']['*']).strip()

It returns this error

UnicodeEncodeError: 'charmap' codec can't encode character '\u2014' in position 5722: character maps to <undefined>

I need to get article sections like article intro using exintro parameter

https://en.wikipedia.org/w/api.php?action=query&prop=extracts&exintro&explaintext&pageids=23862

It returns plain text. Exactly what I need

It will be better if you post your code with the question. That way we can help you better. — user4221591, Apr 04 '19 at 17:45
Did I answer your question? Please mark my answer as accepted! — aleskva, Mar 06 '21 at 20:32

score 1 · Answer 1 · answered May 06 '19 at 08:26

I would suggest to use Pywikibot for this stuff. There is a nice handy pywikibot/data/api.py script you can easily use. Start here: https://www.mediawiki.org/wiki/Manual:Pywikibot/Create_your_own_script and then look into api.py, what options to get the results you want are available.

Get plain text from Wikipedia API by sections

1 Answers1