0

Here is my code.

from bs4 import BeautifulSoup
import pandas as pd
import requests

link1 = 'https://en.wikipedia.org/wiki/List_of_United_States_counties_by_per_capita_income'
page1 = requests.get(link1) 
soup1 = BeautifulSoup(page1.text)
table = soup1.find_all('table')[1]
table_rows = table.find_all('tr')
res = []
for tr in table_rows:
    td = tr.find_all('td')
    row = [tr.text.strip() for tr in td if tr.text.strip()]
    if row:
        res.append(row)
df_state = pd.DataFrame(res, columns=["Rank", "Country-equivalent", "State", "Per capita income", "Population"])
df_state.head()

The error comes at the last line, any help would be very much appreciated, Thanks.

Jona
  • 1,218
  • 1
  • 10
  • 20
Taha
  • 3
  • 2
  • Possible duplicate of [Panda AssertionError columns passed, passed data had 2 columns](https://stackoverflow.com/questions/38927230/panda-assertionerror-columns-passed-passed-data-had-2-columns) – Francisca Concha-Ramírez Nov 26 '19 at 15:53
  • Here are the two first elements of `res` : `[['This article is part of a series on'], ['Topics\nHousehold\nPersonal\nAffluence\nSocial class\nIncome inequality\ngender pay gap\nethnic wage gap']]`. I don't think it's right. – Jona Nov 26 '19 at 16:06

1 Answers1

0

Here is what you expect :

from bs4 import BeautifulSoup
import pandas as pd
import requests

link1 = 'https://en.wikipedia.org/wiki/List_of_United_States_counties_by_per_capita_income'
page1 = requests.get(link1) 
soup1 = BeautifulSoup(page1.text)
table = soup1.find_all('table')[2]
table_rows = table.find_all('tr')
res = []
for tr in table_rows:
    td = tr.find_all('td')
    row = [tr.text.strip() for tr in td if tr.text.strip()]
    if row:
        res.append(row)
df_state = pd.DataFrame(res, columns=["Rank", "Country-equivalent", "State", "Per capita income", "Population", "temp1", "temp2", "temp3"])
df_state.head()

The first change was with line 9: I looked for index 2 instead of 1 in soup1.find_all('table').

The second issue was that you were provided 5 columns in the initialisation of the dataframe were there was 8. I added "temp1", "temp2", "temp3".

Jona
  • 1,218
  • 1
  • 10
  • 20