0

I am having trouble trying to get the data from this web table. I was wondering if anyone could shed some light on my situation, thanks!

HTML:

enter image description here

Darian Moore
  • 33
  • 1
  • 6

1 Answers1

0

You want to use pandas.read_html() in order to collect an HTML table into a pandas DataFrame.

Usage:

>>> import pandas as pd
>>> pd.read_html(io)

Where from the docs:

io : str or file-like
A URL, a file-like object, or a raw string containing HTML.

For example, see with this Wikipedia URL:

In [0]: url = "https://en.wikipedia.org/wiki/List_of_Olympic_records_in_swimming"

In [1]: pd.read_html(url)[0]
Out[1]:
                      Event       Time                                               Name               Nation                Games            Date      Ref
0            50 m freestyle      21.30                                        César Cielo         Brazil (BRA)         2008 Beijing  16 August 2008      [4]
1           100 m freestyle      47.05                                     Eamon Sullivan      Australia (AUS)         2008 Beijing  13 August 2008   [5][A]
2           200 m freestyle    1:42.96                                     Michael Phelps  United States (USA)         2008 Beijing  12 August 2008      [6]
3           400 m freestyle    3:40.14                                           Sun Yang          China (CHN)          2012 London    28 July 2012      [7]
4          1500 m freestyle  ♦14:31.02                                           Sun Yang          China (CHN)          2012 London   4 August 2012      [8]
5          100 m backstroke     ♦51.85                                        Ryan Murphy  United States (USA)  2016 Rio de Janeiro  13 August 2016   [9][B]
6          200 m backstroke    1:53.41                                        Tyler Clary  United States (USA)          2012 London   2 August 2012     [10]
7        100 m breaststroke      57.13                                         Adam Peaty  Great Britain (GBR)  2016 Rio de Janeiro   7 August 2016     [11]
8        200 m breaststroke    2:07.22                                     Ippei Watanabe          Japan (JPN)  2016 Rio de Janeiro   9 August 2016  [12][C]
9           100 m butterfly      50.39                                   Joseph Schooling      Singapore (SGP)  2016 Rio de Janeiro  12 August 2016     [13]
10          200 m butterfly    1:52.03                                     Michael Phelps  United States (USA)         2008 Beijing  13 August 2008     [14]
11  200 m individual medley    1:54.23                                     Michael Phelps  United States (USA)         2008 Beijing  15 August 2008     [15]
12  400 m individual medley   ♦4:03.84                                     Michael Phelps  United States (USA)         2008 Beijing  10 August 2008     [16]
13  4×100 m freestyle relay   ♦3:08.24  Michael Phelps (47.51)Garrett Weber-Gale (47.0...  United States (USA)         2008 Beijing  11 August 2008     [17]
14  4×200 m freestyle relay    6:58.56  Michael Phelps (1:43.31)Ryan Lochte (1:44.28)R...  United States (USA)         2008 Beijing  13 August 2008     [18]
15     4×100 m medley relay    3:27.95  Ryan Murphy (51.85)Cody Miller (59.03) Michael...  United States (USA)  2016 Rio de Jainero  13 August 2016     [19]
arnaud
  • 3,293
  • 1
  • 10
  • 27
  • does it matter if the site requires a login first? – Darian Moore Mar 10 '20 at 15:00
  • If the website requires a login first, then better collect its raw HTML beforehands using traditional `requests` library (passing it the necessary credentials) and then use `read_html` to just parse the table from raw HTML. (and not from URL). – arnaud Mar 10 '20 at 15:18
  • You can accept the answer if you're satisfied with it! – arnaud Mar 11 '20 at 08:47