convert website table to pandas df (beautifulsoup doesn't recognize table)

Question

I want to convert a website table to pandas df, but BeautifulSoup doesn't recognize the table (snipped image below). Below is the code I tried with no luck.

from bs4 import BeautifulSoup
import requests
import pandas as pd

url = 'https://www.ndbc.noaa.gov/ship_obs.php'
headers = {'User-Agent': 'Mozilla/5.0'}
response = requests.get(url, headers=headers)

soup = BeautifulSoup(response.content, 'html.parser')
tables = soup.find_all('table', rules = 'all')
#tables =soup.find_all("table",{"style":"color:#333399;"}) #instead of above line to specify table with no luck!
df = pd.read_html(table, skiprows=2, flavor='bs4')
df.head()

I also tried the code below with no luck

df = pd.read_html('https://www.ndbc.noaa.gov/ship_obs.php')
print(df)

Well. The data is not stored in a table. It`s a bunch of `span` tags. I guess this will help you. — mosc9575, Feb 20 '21 at 20:19

score 3 · Accepted Answer · answered Feb 20 '21 at 19:41

Your table is not in the <table> tag but in multiple <span> tags.

You can parse these to a dataframe like so:

import pandas as pd
import requests
import bs4

url = f"https://www.ndbc.noaa.gov/ship_obs.php"
soup = bs4.BeautifulSoup(requests.get(url).text, 'html.parser').find('pre').find_all("span")
print(pd.DataFrame([r.getText().split() for r in soup]))

Output:

      0     1     2      3     4     5   ...    40    41    42    43    44    45
0    SHIP  HOUR   LAT    LON  WDIR  WSPD  ...    °T    ft   sec    °T   Acc   Ice
1    SHIP    19  46.5  -72.3   260   5.1  ...  None  None  None  None  None  None
2    SHIP    19  46.8  -71.2   110   2.9  ...  None  None  None  None  None  None
3    SHIP    19  47.4  -61.8    40  18.1  ...  None  None  None  None  None  None
4    SHIP    19  47.7  -53.2    40   8.0  ...  None  None  None  None  None  None
..    ...   ...   ...    ...   ...   ...  ...   ...   ...   ...   ...   ...   ...
170  SHIP    19  17.6  -62.4   100  20.0  ...  None  None  None  None  None  None
171  SHIP    19  25.8  -78.0    40  24.1  ...  None  None  None  None  None  None
172  SHIP    19   1.5  104.8    20  22.0  ...  None  None  None  None  None  None
173  SHIP    19  57.9    1.2   180     -  ...  None  None  None  None  None  None
174  SHIP    19  35.1  -10.0   310  24.1  ...  None  None  None  None  None  None

[175 rows x 46 columns]

score 2 · Answer 2 · answered Feb 20 '21 at 20:14

Slightly different approach, and look at column counts too. I skipped lines at the top, so you'll have to build the column headers and clean up that last row.

import io
url = 'https://www.ndbc.noaa.gov/ship_obs.php'
page = requests.get(url)
soup = BeautifulSoup(page.text, "html.parser")
tablecontent = soup.find('pre')
data = BeautifulSoup(tablecontent.text, "html.parser")
s = io.StringIO(data.text)
df = pd.read_csv(s, sep='\s+', engine='python', skiprows=3, header=None)

Output (sorry, copying out of jupyter is not aligning well)

    0   1   2   3   4   5   6   7   8   9   ... 14  15  16  17  18  19  20  21  22  23
0   SHIP    19  47.4    -61.8   40  18.1    -   -   -   29.82   ... -   -   -   -   -   -   -   -   ----    -----
1   SHIP    19  47.7    -53.2   40  8.0 -   -   -   29.76   ... -   -   -   -   -   -   -   -   ----    -----
2   SHIP    19  47.8    -54.1   50  13.0    -   -   -   29.75   ... -   -   -   -   -   -   -   -   ----    -----
3   SHIP    19  48.2    -53.4   50  13.0    -   -   -   29.78   ... -   -   -   -   -   -   -   -   ----    -----
4   SHIP    19  46.8    -71.2   110 2.9 -   -   -   30.03   ... -   -   -   -   -   -   -   -   ----    -----
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
178 SHIP    19  25.8    -78.0   40  24.1    -   4.9 4.0 30.08   ... 11  5   -   -   -   -   -   -   ----    -----
179 SHIP    19  1.5 104.8   20  22.0    -   -   -   29.87   ... 11  5   -   -   -   -   -   -   ----    -----
180 SHIP    19  57.9    1.2 180 -   -   -   -   29.35   ... 5   -   -   -   -   -   -   -   ----    -----
181 SHIP    19  35.1    -10.0   310 24.1    -   6.6 6.0 29.68   ... 5   8   14.8    10.0    310 -   -   -   ----    -----
182 182 ship    observations    reported    for 1900    GMT None    None    None    ... None    None    None    None    None    None    None    None    None    None

convert website table to pandas df (beautifulsoup doesn't recognize table)

2 Answers2