0

I'm trying to scrape the date, and the minimum and maximum temperatures from the site https://www.ipma.pt/pt/otempo/prev.localidade.hora/#Porto&Gondomar. I want to find all of the divs with the date class and all the spans with the tempMin and tempMax classes, so I wrote

pagina2= "https://www.ipma.pt/pt/otempo/prev.localidade.hora/#Porto&Gondomar"
client2= uReq(pagina2)
pagina2bs= soup(client2.read(), "html.parser")
client2.close()

data = pagina2bs.find_all("div", class_="date")
minT = pagina2bs.find_all("span", class_="tempMin")
maxT = pagina2bs.find_all("span", class_="tempMax")

but all I get are empty lists. I've compared this with similar code and I can't see where I made a mistake, since there are clearly tags with these classes.

Onshoe
  • 3
  • 1
  • The page uses dynamic javascript to load the content. You should use other methods, like selenium, instead of request. – Arthur Pereira Dec 08 '20 at 16:41
  • Check if [this](https://stackoverflow.com/questions/65186906/why-is-html-returned-by-requests-different-from-the-real-page-html/65187344#65187344) answer your question. – Arthur Pereira Dec 08 '20 at 16:42

1 Answers1

0

From my perspective it has to do with the content of the pagina2bs variable. You are passing the right variables to the find_all method.

Use selenium to get the html of that website.

from bs4 import BeautifulSoup as bs
from selenium import webdriver 
from selenium.webdriver.chrome.options import Options
import html5lib
chrome_options = Options()
chrome_options.add_argument("--headless")
driver = webdriver.Chrome(options=chrome_options, executable_path='C:/Users/**USERNAME**/Desktop/chromedriver.exe')
startUrl = "https://www.ipma.pt/pt/otempo/prev.localidade.hora/#Porto&Gondomar"
driver.get(startUrl)
html = driver.page_source

soup = bs(html,features="html5lib")
divs = soup.find_all("div", class_="date")
print(divs)

Install all the needed packages and a the selenium chrome driver. Link to this chromedriver in the code like I did on my machine.

  • You're right, pagina2bs doesn't contain that part. Here is the variable (https://pastebin.com/bWC3AhGQ). Is it because the html is too big or something like that? – Onshoe Dec 08 '20 at 16:55
  • No the html is not to big, but just does not contain the right information. I adapted my answer. Now you can use the code to get the information you want from that site! – Christoph_Raidl Dec 09 '20 at 11:27