Scrape

Question

I am trying to create a simple weather forecast with Python in Eclipse. So far I have written this:

from bs4 import BeautifulSoup
import requests


def weather_forecast():
    url = 'https://www.yr.no/nb/v%C3%A6rvarsel/daglig-tabell/1-92416/Norge/Vestland/Bergen/Bergen'
    r = requests.get(url)  # Get request for contents of the page
    print(r.content)  # Outputs HTML code for the page
    soup = BeautifulSoup(r.content, 'html5lib')  # Parse the data with BeautifulSoup(HTML-string, html-parser)
    min_max = soup.select('min-max.temperature')  # Select all spans with a "min-max-temperature" attribute
    print(min_max.prettify())
    table = soup.find('div', attrs={'daily-weather-list-item__temperature'})
    print(table.prettify())

From a html-page with elements that looks like this:

I have found the path to the first temperature in the HTML-page's elements, but when I try and execute my code, and print to see if I have done it correctly, nothing is printed. My goal is to print a table with dates and corresponding temperatures, which seems like an easy task, but I do not know how to properly name the attribute or how to scrape them all from the HTML-page in one iteration.

The <span has two temperatures stored, one min and one max, here it just happens that they're the same.

I want to go into each <div class="daily-weather-list-item__temperature", collect the two temperatures and add them to a dictionary, how do I do this?

I have looked at this question on stackoverflow but I couldn't figure it out: Python BeautifulSoup - Scraping Div Spans and p tags - also how to get exact match on div name

If the code has script tags attached to it, then bs4 doesn't work half the time. If that is the case you'll have to use something like selenium. — Aaron Cloud, Jun 09 '21 at 21:42

score 1 · Accepted Answer · answered Jun 10 '21 at 00:23

You could use a dictionary comprehension. Loop over all the forecasts which have class daily-weather-list-item, then extract date from the datetime attribute of the time tags, and use those as keys; associate the keys with the maxmin info.

import requests
from bs4 import BeautifulSoup

def weather_forecast():
    url = 'https://www.yr.no/nb/v%C3%A6rvarsel/daglig-tabell/1-92416/Norge/Vestland/Bergen/Bergen'
    r = requests.get(url)  # Get request for contents of the page
    soup = BeautifulSoup(r.content, 'html5lib')  
    temps = {i.select_one('time')['datetime']:i.select_one('.min-max-temperature').get_text(strip= True) 
             for i in soup.select('.daily-weather-list-item')}
    return temps

weather_forecast()

Scrape

1 Answers1

Linked