0

Im trying to scrape a table from a website but I cant seem to figure it out with Beautifulsoup in Python. Im not sure if its because of the table format, but I basically want to turn this table into a CSV.

from bs4 import BeautifulSoup
import requests

page = requests.geenter code heret("https://spotwx.com/products/grib_index.php?model=hrrr_wrfprsf&lat=41.03399&lon=-73.76291&tz=America/New_York&display=table")
soup = BeautifulSoup(page.content, 'html.parser')
print(soup.prettify)

Any advice on how to isolate this data table? I've checked so many Beautifulsoup tutorials, but the HTML looks different than most references. Many thanks in advance for your help -

Frank Drin
  • 1,613
  • 2
  • 13
  • 18

1 Answers1

1

Try this. The table from that site generates dynamically so you can't get results using requests only.

from selenium import webdriver
from bs4 import BeautifulSoup
import csv

link = "https://spotwx.com/products/grib_index.php?model=hrrr_wrfprsf&lat=41.03399&lon=-73.76291&tz=America/New_York&display=table"

with open("spotwx.csv", "w", newline='') as infile:
    writer = csv.writer(infile)
    writer.writerow(['DateTime','Tmp','Dpt','Rh','Wh','Wd','Wg','Apcp','Slp'])
    with webdriver.Chrome() as driver:
        driver.get(link)
        soup = BeautifulSoup(driver.page_source, 'lxml')
        for item in soup.select("table#example tbody tr"):
            data = [elem.text for elem in item.select('td')]
            print(data)
            writer.writerow(data)
SIM
  • 21,997
  • 5
  • 37
  • 109
  • Thanks a lot for your response. I am unfamiliar with Webdriver, but I dont need this to refresh in realtime (and would prefer not to use Webdriver unless absolutely necessary). It appears that simply doing a requests pull shows the necessary data in the soup.prettify code, but I just cant figure out how to extract it into a table. Thanks again for the help ! – Frank Drin Oct 06 '17 at 14:43
  • When I do try the code above, I get the error selenium.common.exceptions.WebDriverException: Message: 'chromedriver' executable needs to be in PATH. Please see https://sites.google.com/a/chromium.org/chromedriver/home – Frank Drin Oct 06 '17 at 14:44
  • The first one should work. If not then go for the second one. 1. `driver = webdriver.Chrome('C:/path/to/chromedriver.exe')` 2. `driver = webdriver.Chrome('/path/to/chromedriver')` .Btw, you must try it according to your system, i meant the path. Thanks. – SIM Oct 06 '17 at 15:49
  • Shahin this is great, workse perfectly. Thank you very much for the assistance. I just wanted to confirm there is no way to scrape this without launching it in chrome, as I DO see the data that im trying to isolate in the requests.get pull. Thanks – Frank Drin Oct 06 '17 at 16:13
  • I'm not a great fan of selenium either. However, when it comes to deal with javascript enabled sites, selenium is second to none and `requests` is of no help. Btw, headless browser like `Phantomjs` you may like but there are several issues with this headless browser automation. – SIM Oct 06 '17 at 16:35
  • Great, thank you again for all of your help, again, it is really appreciated. Have a great weekend – Frank Drin Oct 06 '17 at 16:39