How to read html table as dataframe (urllib.error.URLError: )?

Question

I would appreciate if you could let me know how to convert a html table into a dataframe.

import pandas as pd
df = pd.read_html('https://www.iasplus.com/en/resources/ifrs-topics/use-of-ifrs', header = None)

Error:

C:\Users\t\Anaconda3\python.exe C:/Users/t/Downloads/hyperopt12.py
Traceback (most recent call last):
  File "C:/Users/t/Downloads/hyperopt12.py", line 12, in <module>
    df = pd.read_html('https://www.iasplus.com/en/resources/ifrs-topics/use-of-ifrs', header = None)
  File "C:\Users\t\Anaconda3\lib\site-packages\pandas\io\html.py", line 1094, in read_html
    displayed_only=displayed_only)
  File "C:\Users\t\Anaconda3\lib\site-packages\pandas\io\html.py", line 916, in _parse
    raise_with_traceback(retained)
  File "C:\Users\t\Anaconda3\lib\site-packages\pandas\compat\__init__.py", line 420, in raise_with_traceback
    raise exc.with_traceback(traceback)
urllib.error.URLError: <urlopen error unknown url type: https>

Thanks in advance.

@anky_91 Thanks. It is 0.24.0. It could read this page: http://www.espn.com/nhl/statistics/player/_/stat/points/sort/points/year/2015/seasontype/2 — ebrahimi, Feb 03 '19 at 12:15
@anky_91 I could read the link provided in the last comment but I'm not able to read the link provided in the question. — ebrahimi, Feb 03 '19 at 12:18

score 1 · Accepted Answer · answered Feb 03 '19 at 12:43

1

You need to find the right table on the page to read. read_html returns a list of dataframe objects. See the documentation here.

import pandas as pd
tables = pd.read_html('https://www.iasplus.com/en/resources/ifrs-topics/use-of-ifrs', header = None)
df = tables[2]
df

answered Feb 03 '19 at 12:43

run-out

3,114
1
9
25

Thanks. the same error: `tables = pd.read_html('https://www.iasplus.com/en/resources/ifrs-topics/use-of-ifrs', header = None) File "C:\Users\t\Anaconda3\lib\site-packages\pandas\io\html.py", line 1094, in read_html displayed_only=displayed_only) File "C:\Users\t\Anaconda3\lib\site-packages\pandas\io\html.py", line 916, in _parse raise_with_traceback(retained) File "C:\Users\t\Anaconda3\lib\site-packages\pandas\compat\__init__.py", line 420, in raise_with_traceback raise exc.with_traceback(traceback) urllib.error.URLError: ` – ebrahimi Feb 03 '19 at 12:59
This is working for me in Jupyter Notebook. I'm unable to reproduce your error. Sorry. Maybe this article will be useful. – run-out Feb 03 '19 at 13:12
Thanks. I also tried it in R. l was not able to read it in R too. Again, the link provided in the comment section could be read in R too. – ebrahimi Feb 03 '19 at 13:29
`> library(XML) > library(RCurl) > library(rlist) > theurl <- getURL("https://www.iasplus.com/en/resources/ifrs-topics/use-of-ifrs",.opts = list(ssl.verifypeer = FALSE) ) Error in function (type, msg, asError = TRUE) : error:1407742E:SSL routines:SSL23_GET_SERVER_HELLO:tlsv1 alert protocol version > tables <- readHTMLTable(theurl) Error in readHTMLTable(theurl) : object 'theurl' not found > tables <- list.clean(tables, fun = is.null, recursive = FALSE) Error in list.clean(tables, fun = is.null, recursive = FALSE) : object 'tables' not found` – ebrahimi Feb 03 '19 at 13:31
`> n.rows <- unlist(lapply(tables, function(t) dim(t)[1])) Error in lapply(tables, function(t) dim(t)[1]) : object 'tables' not found > tables[[which.max(n.rows)]] Error: object 'tables' not found` – ebrahimi Feb 03 '19 at 13:32
1

There was a problem with my Pycharm installation. However, I accepted your answer. Thanks. – ebrahimi Feb 18 '19 at 13:55

How to read html table as dataframe (urllib.error.URLError: )?

1 Answers1