pd.read_html() imports a list rather than a dataframe

Question

I used pd.read_html() to import a table from a webpage but instead of structuring the data as a dataframe Python imported it as a list. How can I import the data as a dataframe? Thank you!

The code is the following:

import pandas as pd

import html5lib

url = 'http://www.fdic.gov/bank/individual/failed/banklist.html'

dfs = pd.read_html(url)

type(dfs)

Out[1]: list

score 21 · Answer 1 · answered Sep 26 '16 at 19:36

21

.read_html() produces a list of dataframes (there could be multiple tables in an HTML source), get the desired one by index. In your case, there is a single dataframe:

dfs = pd.read_html(url)
df = dfs[0]
print(df)

Note that, if there are no tables in the HTML source, it would return an error and would never produce an empty list.

answered Sep 26 '16 at 19:36

alecxe

462,703
120
1,088
1,195

Thank you! Indeed df is a dataframe. However when I tried the .head, .tail and .index parameters on df Python returned error messages. How can I correct this? – AlK Sep 26 '16 at 19:45
@AlexanderKonstantinidis interesting, `df.tail`, `df.head`, `df.index` work for me. What error(s) are you getting? – alecxe Sep 26 '16 at 19:48
AttributeError: 'DataFrame' object has no attribute 'heads', AttributeError: 'DataFrame' object has no attribute 'tails', TypeError: 'Index' object is not callable – AlK Sep 26 '16 at 19:51
2

@AlexanderKonstantinidis ah, it is `tail`, `head` and `index` - no `s` prefix and no `()` (these are not methods). – alecxe Sep 26 '16 at 19:56

score 8 · Answer 2 · edited Apr 10 '19 at 14:23

8

import pandas as pd
import html5lib
url = 'http://www.fdic.gov/bank/individual/failed/banklist.html'
dfs = pd.read_html(url)
df = pd.concat(dfs)
df

edited Apr 10 '19 at 14:23

Ajay Kumar Ganesh

1,838
2
25
33

answered Apr 10 '19 at 13:45

Nikhil Chawla

81
1
2

1

this was helpful to me. solved the problem of scraping a table as a list, then converts that list into a dataframe – Tony S Nov 30 '19 at 16:57

pd.read_html() imports a list rather than a dataframe

2 Answers2

Linked

Related