Since you're trying to scrape just one element from the whole HTML (if so) there's no need to use find_all()
/findAll()
methods.
Instead, you can use find()
or select_one()
methods that bs4
provides to grab one specific element or select using CSS
selectors. You can use SelectorGadget to find css
selectors.
For example: Say you want to scrape Weather data from Google Search answer box result.
You can do this like so:
- Using a custom script. I scraped a bit more just to show that it's a straightforward process.
- Using Google Direct Answer Box API from SerpApi. It's a paid API with a free trial of 5,000 searches. Check out the playground to test out.
Code and full example in the online IDE (works on other weather searches as well):
from bs4 import BeautifulSoup
import requests, lxml
headers = {
"User-Agent":
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582"
}
response = requests.get('https://www.google.com/search?q=london weather', headers=headers).text
soup = BeautifulSoup(response, 'lxml')
weather_condition = soup.select_one('#wob_dc').text
tempature = soup.select_one('#wob_tm').text
precipitation = soup.select_one('#wob_pp').text
humidity = soup.select_one('#wob_hm').text
wind = soup.select_one('#wob_ws').text
current_time = soup.select_one('#wob_dts').text
print(f'Weather condition: {weather_condition}\nTempature: {tempature}°F\nPrecipitation: {precipitation}\nHumidity: {humidity}\nWind speed: {wind}\nCurrent time: {current_time}')
# output:
'''
Weather condition: Mostly cloudy
Tempature: 47°F
Precipitation: 79%
Humidity: 49%
Wind speed: 9 mph
Current time: Thursday 10:00 AM
'''
Basically, the main difference is that by using Google Direct Answer Box API everything is already done for the end-user with a json
output and you don't need to figure out stuff and tinker with HTML elements to get desired output or guessing why the output is different although it should be quite different.
Code to scrape weather answer box:
from serpapi import GoogleSearch
import os
params = {
"engine": "google",
"q": "london weather",
"api_key": os.getenv("API_KEY"),
"hl": "en",
}
search = GoogleSearch(params)
results = search.get_dict()
loc = results['answer_box']['location']
weather_date = results['answer_box']['date']
weather = results['answer_box']['weather']
temp = results['answer_box']['temperature']
unit = results['answer_box']['unit']
precipitation = results['answer_box']['precipitation']
humidity = results['answer_box']['humidity']
wind = results['answer_box']['wind']
forecast = results['answer_box']['forecast']
print(f'{loc}\n{weather_date}\n{weather}\n{temp}\n{unit}\n{precipitation}\n{humidity}\n{wind}\n\n{forecast}')
# output:
'''
London, UK
Thursday 7:00 AM
Mostly sunny
53
Fahrenheit
2%
89%
1 mph
[{'day': 'Thursday', 'weather': 'Mostly cloudy', 'temperature': {'high': '70', 'low': '53'}]
...
'''
Disclaimer, I work for SerpApi.