0

I would appreciate if you could let me know how to convert a html table into a dataframe.

import pandas as pd
df = pd.read_html('https://www.iasplus.com/en/resources/ifrs-topics/use-of-ifrs', header = None)

Error:

C:\Users\t\Anaconda3\python.exe C:/Users/t/Downloads/hyperopt12.py
Traceback (most recent call last):
  File "C:/Users/t/Downloads/hyperopt12.py", line 12, in <module>
    df = pd.read_html('https://www.iasplus.com/en/resources/ifrs-topics/use-of-ifrs', header = None)
  File "C:\Users\t\Anaconda3\lib\site-packages\pandas\io\html.py", line 1094, in read_html
    displayed_only=displayed_only)
  File "C:\Users\t\Anaconda3\lib\site-packages\pandas\io\html.py", line 916, in _parse
    raise_with_traceback(retained)
  File "C:\Users\t\Anaconda3\lib\site-packages\pandas\compat\__init__.py", line 420, in raise_with_traceback
    raise exc.with_traceback(traceback)
urllib.error.URLError: <urlopen error unknown url type: https>

Thanks in advance.

Ian Stapleton Cordasco
  • 26,944
  • 4
  • 67
  • 72
ebrahimi
  • 912
  • 2
  • 13
  • 32

1 Answers1

1

You need to find the right table on the page to read. read_html returns a list of dataframe objects. See the documentation here.

import pandas as pd
tables = pd.read_html('https://www.iasplus.com/en/resources/ifrs-topics/use-of-ifrs', header = None)
df = tables[2]
df
run-out
  • 3,114
  • 1
  • 9
  • 25
  • Thanks. the same error: `tables = pd.read_html('https://www.iasplus.com/en/resources/ifrs-topics/use-of-ifrs', header = None) File "C:\Users\t\Anaconda3\lib\site-packages\pandas\io\html.py", line 1094, in read_html displayed_only=displayed_only) File "C:\Users\t\Anaconda3\lib\site-packages\pandas\io\html.py", line 916, in _parse raise_with_traceback(retained) File "C:\Users\t\Anaconda3\lib\site-packages\pandas\compat\__init__.py", line 420, in raise_with_traceback raise exc.with_traceback(traceback) urllib.error.URLError: ` – ebrahimi Feb 03 '19 at 12:59
  • This is working for me in Jupyter Notebook. I'm unable to reproduce your error. Sorry. Maybe this article will be useful. – run-out Feb 03 '19 at 13:12
  • Thanks. I also tried it in R. l was not able to read it in R too. Again, the link provided in the comment section could be read in R too. – ebrahimi Feb 03 '19 at 13:29
  • `> library(XML) > library(RCurl) > library(rlist) > theurl <- getURL("https://www.iasplus.com/en/resources/ifrs-topics/use-of-ifrs",.opts = list(ssl.verifypeer = FALSE) ) Error in function (type, msg, asError = TRUE) : error:1407742E:SSL routines:SSL23_GET_SERVER_HELLO:tlsv1 alert protocol version > tables <- readHTMLTable(theurl) Error in readHTMLTable(theurl) : object 'theurl' not found > tables <- list.clean(tables, fun = is.null, recursive = FALSE) Error in list.clean(tables, fun = is.null, recursive = FALSE) : object 'tables' not found` – ebrahimi Feb 03 '19 at 13:31
  • `> n.rows <- unlist(lapply(tables, function(t) dim(t)[1])) Error in lapply(tables, function(t) dim(t)[1]) : object 'tables' not found > tables[[which.max(n.rows)]] Error: object 'tables' not found` – ebrahimi Feb 03 '19 at 13:32
  • 1
    There was a problem with my Pycharm installation. However, I accepted your answer. Thanks. – ebrahimi Feb 18 '19 at 13:55