I have a list of strings called cities, where each string is a city name that is also the title of a wikipedia page. For each city, I'm getting the wikipedia page and then looking at the text content of it:
cities = [(n["name"]) for n in graph.nodes.match("City")]
for city in cities:
site = pywikibot.Site(code="en", fam="wikivoyage")
page = pywikibot.Page(site, city)
text = page.text
One of the cities in my list is a place called L'Aquila and it was not returning anything for text (whereas other entries were). I figured that was because of the '
in the name. So I used re.sub
to to escape the '
and pass in that result instead. This gives me what I expected:
cities = [(n["name"]) for n in graph.nodes.match("City")]
city = "L'Aquila"
altered_city = re.sub("'", "\'", city)
print(altered_city)
site = pywikibot.Site(code="en", fam="wikivoyage")
page = pywikibot.Page(site, altered_city)
print(page)
print(page.text)
Result:
[[wikivoyage:en:L'Aquila]]
{{pagebanner|Pagebanner default.jpg}}
'''L'Aquila''' is the capital of the province of the same name in the region of [[Abruzzo]] in [[Italy]] and is located in the northern part of the..
But the issue is I don't want to hard-code the city name, I want to use the strings from my list. And when I pass this in, it does not give me any results for page.text:
cities = [(n["name"]) for n in graph.nodes.match("City")]
city_from_list = cities[0]
print(city_from_list)
print(type(city_from_list))
altered_city = re.sub("'", "\'", city_from_list)
site = pywikibot.Site(code="en", fam="wikivoyage")
page = pywikibot.Page(site, altered_city)
print(page)
print(page.text)
Result:
L'Aquila
<class 'str'>
[[wikivoyage:en:L'Aquila]]
I printed out the value and type for the city element I'm getting from the list and it is a String, so I have no idea why it worked above but not here. How are these different?