I'm trying to get plain(without html/css/special characters/ characters like \n/links/images) text of section using wikipedia api. I trying to do that with this code
import requests
API_URL = 'http://en.wikipedia.org/w/api.php'
def get_section(page, section):
search_params = {
'action': 'parse',
'prop': 'text',
'pageid': page,
'section': section,
'format': 'json'
}
response = requests.get(API_URL, params=search_params)
return response.json()
text = get_section(23862, 2)
print(text['parse']['text']['*']).strip()
It returns this error
UnicodeEncodeError: 'charmap' codec can't encode character '\u2014' in position 5722: character maps to <undefined>
I need to get article sections like article intro using exintro
parameter
https://en.wikipedia.org/w/api.php?action=query&prop=extracts&exintro&explaintext&pageids=23862
It returns plain text. Exactly what I need