I have troubles sorting a wiki table and hope someone who has done it before can give me advice.
From the List_of_current_heads_of_state_and_government I need countries (works with the code below) and then only the first mention of Head of state + their names. I am not sure how to isolate the first mention as they all come in one cell. And my attempt to pull their names gives me this error: IndexError: list index out of range
. Will appreciate your help!
import requests
from bs4 import BeautifulSoup
wiki = "https://en.wikipedia.org/wiki/List_of_current_heads_of_state_and_government"
website_url = requests.get(wiki).text
soup = BeautifulSoup(website_url,'lxml')
my_table = soup.find('table',{'class':'wikitable plainrowheaders'})
#print(my_table)
states = []
titles = []
names = []
for row in my_table.find_all('tr')[1:]:
state_cell = row.find_all('a')[0]
states.append(state_cell.text)
print(states)
for row in my_table.find_all('td'):
title_cell = row.find_all('a')[0]
titles.append(title_cell.text)
print(titles)
for row in my_table.find_all('td'):
name_cell = row.find_all('a')[1]
names.append(name_cell.text)
print(names)
Desirable output would be a pandas df:
State | Title | Name |