17

I used pd.read_html() to import a table from a webpage but instead of structuring the data as a dataframe Python imported it as a list. How can I import the data as a dataframe? Thank you!

The code is the following:

import pandas as pd

import html5lib

url = 'http://www.fdic.gov/bank/individual/failed/banklist.html'

dfs = pd.read_html(url)

type(dfs)

Out[1]: list
alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195
AlK
  • 443
  • 2
  • 12
  • 19

2 Answers2

21

.read_html() produces a list of dataframes (there could be multiple tables in an HTML source), get the desired one by index. In your case, there is a single dataframe:

dfs = pd.read_html(url)
df = dfs[0]
print(df)

Note that, if there are no tables in the HTML source, it would return an error and would never produce an empty list.

alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195
  • Thank you! Indeed df is a dataframe. However when I tried the .head, .tail and .index parameters on df Python returned error messages. How can I correct this? – AlK Sep 26 '16 at 19:45
  • @AlexanderKonstantinidis interesting, `df.tail`, `df.head`, `df.index` work for me. What error(s) are you getting? – alecxe Sep 26 '16 at 19:48
  • AttributeError: 'DataFrame' object has no attribute 'heads', AttributeError: 'DataFrame' object has no attribute 'tails', TypeError: 'Index' object is not callable – AlK Sep 26 '16 at 19:51
  • 2
    @AlexanderKonstantinidis ah, it is `tail`, `head` and `index` - no `s` prefix and no `()` (these are not methods). – alecxe Sep 26 '16 at 19:56
8
import pandas as pd
import html5lib
url = 'http://www.fdic.gov/bank/individual/failed/banklist.html'
dfs = pd.read_html(url)
df = pd.concat(dfs)
df
Ajay Kumar Ganesh
  • 1,838
  • 2
  • 25
  • 33
Nikhil Chawla
  • 81
  • 1
  • 2
  • 1
    this was helpful to me. solved the problem of scraping a table as a list, then converts that list into a dataframe – Tony S Nov 30 '19 at 16:57