BeautifulSoup "AttributeError: 'NoneType' object has no attribute 'text'"

Question

I was web-scraping weather-searched Google with bs4, and Python can't find a <span> tag when there is one. How can I solve this problem?

I tried to find this <span> with the class and the id, but both failed.

<div id="wob_dcp">
    <span class="vk_gy vk_sh" id="wob_dc">Clear with periodic clouds</span>    
</div>

Above is the HTML code I was trying to scrape in the page:

response = requests.get('https://www.google.com/search?hl=ja&ei=coGHXPWEIouUr7wPo9ixoAg&q=%EC%9D%BC%EB%B3%B8+%E6%A1%9C%E5%B7%9D%E5%B8%82%E7%9C%9F%E5%A3%81%E7%94%BA%E5%8F%A4%E5%9F%8E+%EB%82%B4%EC%9D%BC+%EB%82%A0%EC%94%A8&oq=%EC%9D%BC%EB%B3%B8+%E6%A1%9C%E5%B7%9D%E5%B8%82%E7%9C%9F%E5%A3%81%E7%94%BA%E5%8F%A4%E5%9F%8E+%EB%82%B4%EC%9D%BC+%EB%82%A0%EC%94%A8&gs_l=psy-ab.3...232674.234409..234575...0.0..0.251.929.0j6j1......0....1..gws-wiz.......35i39.yu0YE6lnCms')
soup = BeautifulSoup(response.content, 'html.parser')

tomorrow_weather = soup.find('span', {'id': 'wob_dc'}).text

But failed with this code, the error is:

Traceback (most recent call last):
  File "C:\Users\sungn_000\Desktop\weather.py", line 23, in <module>
    tomorrow_weather = soup.find('span', {'id': 'wob_dc'}).text
AttributeError: 'NoneType' object has no attribute 'text'

Please solve this error.

Please post a valid url of the page instead of a micro-image. Thank you! — DirtyBit, Mar 26 '19 at 07:34
Doesn't work for me either with chrome. Is it supposed to get an actual page or just the search results? That id is not present when I inspect. — QHarr, Mar 26 '19 at 08:07

pawelbylina · Accepted Answer · 2019-03-26T08:00:12.323

This is because the weather section is rendered by the browser via JavaScript. So when you use requests you only get the HTML content of the page which doesn't have what you need. You should use for example selenium (or requests-html) if you want to parse page with elements rendered by web browser.

from bs4 import BeautifulSoup
from requests_html import HTMLSession
session = HTMLSession()
response = session.get('https://www.google.com/search?hl=en&ei=coGHXPWEIouUr7wPo9ixoAg&q=%EC%9D%BC%EB%B3%B8%20%E6%A1%9C%E5%B7%9D%E5%B8%82%E7%9C%9F%E5%A3%81%E7%94%BA%E5%8F%A4%E5%9F%8E%20%EB%82%B4%EC%9D%BC%20%EB%82%A0%EC%94%A8&oq=%EC%9D%BC%EB%B3%B8%20%E6%A1%9C%E5%B7%9D%E5%B8%82%E7%9C%9F%E5%A3%81%E7%94%BA%E5%8F%A4%E5%9F%8E%20%EB%82%B4%EC%9D%BC%20%EB%82%A0%EC%94%A8&gs_l=psy-ab.3...232674.234409..234575...0.0..0.251.929.0j6j1......0....1..gws-wiz.......35i39.yu0YE6lnCms')
soup = BeautifulSoup(response.content, 'html.parser')

tomorrow_weather = soup.find('span', {'id': 'wob_dc'}).text
print(tomorrow_weather)

Output:

pawel@pawel-XPS-15-9570:~$ python test.py
Clear with periodic clouds

Thanks a lot! With `requests-html`, it finally worked! – sjk1204 Mar 26 '19 at 08:05 — sjk1204, Mar 26 '19 at 08:05

Pravin · Answer 2 · 2019-03-26T07:45:45.457

0

>>> from bs4 import BeautifulSoup
>>> soup = BeautifulSoup(a)
>>> a
'<div id="wob_dcp">\n    <span class="vk_gy vk_sh" id="wob_dc">Clear with periodic clouds</span>    \n</div>'
>>> soup.find("span", id="wob_dc").text
'Clear with periodic clouds'

Try this out.

edited Mar 26 '19 at 07:45

answered Mar 26 '19 at 07:42

Pravin

677
2
7
22

Obviously this works, but this is not the answer to the question. He requires it using `requests`. – DirtyBit Mar 26 '19 at 07:44
If you inspect the response.content of request.get you won't find span in the content. It is because Javascript rendering the HTML and you don't get what you intending. – Pravin Mar 26 '19 at 07:53

score 0 · Answer 3 · answered Sep 10 '21 at 05:36

It's not rendered via JavaScript as pawelbylina mentioned, and you don't have to use requests-html or selenium since everything needed is in the HTML, and it will slow down the scraping process a lot because of page rendering.

It could be because there's no user-agent specified thus Google blocks your request and you receiving a different HTML with some sort of error because the default requests user-agent is python-requests. Google understands it and blocks a request since it's not the "real" user visit. Checks what's your user-agent.

Pass user-agent intro request headers:

headers = {
  "User-Agent":
  "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582"
}

requests.get("YOUR_URL", headers=headers)

You're looking for this, use select_one() to grab just one element:

soup.select_one('#wob_dc').text

Have a look at SelectorGadget Chrome extension to grab CSS selectors by clicking on the desired elements in your browser.

Code and full example that scrapes more in the online IDE:

from bs4 import BeautifulSoup
import requests, lxml

headers = {
  "User-Agent":
  "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582"
}

params = {
  "q": "일본 桜川市真壁町古城 내일 날씨",
  "hl": "en",
}

response = requests.get('https://www.google.com/search', headers=headers, params=params)
soup = BeautifulSoup(response.text, 'lxml')

location = soup.select_one('#wob_loc').text
weather_condition = soup.select_one('#wob_dc').text
tempature = soup.select_one('#wob_tm').text
precipitation = soup.select_one('#wob_pp').text
humidity = soup.select_one('#wob_hm').text
wind = soup.select_one('#wob_ws').text
current_time = soup.select_one('#wob_dts').text

print(f'Location: {location}\n'
      f'Weather condition: {weather_condition}\n'
      f'Temperature: {tempature}°F\n'
      f'Precipitation: {precipitation}\n'
      f'Humidity: {humidity}\n'
      f'Wind speed: {wind}\n'
      f'Current time: {current_time}\n')

------
'''
Location: Makabecho Furushiro, Sakuragawa, Ibaraki, Japan
Weather condition: Cloudy
Temperature: 79°F
Precipitation: 40%
Humidity: 81%
Wind speed: 7 mph
Current time: Saturday
'''

Alternatively, you can achieve the same thing by using the Direct Answer Box API from SerpApi. It's a paid API with a free plan.

The difference in your case is that you don't have to think about how to bypass block from Google or figure out why data from certain elements aren't extracting as it should since it's already done for the end-user. The only thing that needs to be done is to iterate over structured JSON and grab the data you want.

Code to integrate:

from serpapi import GoogleSearch
import os

params = {
  "engine": "google",
  "q": "일본 桜川市真壁町古城 내일 날씨",
  "api_key": os.getenv("API_KEY"),
  "hl": "en",
}

search = GoogleSearch(params)
results = search.get_dict()

loc = results['answer_box']['location']
weather_date = results['answer_box']['date']
weather = results['answer_box']['weather']
temp = results['answer_box']['temperature']
precipitation = results['answer_box']['precipitation']
humidity = results['answer_box']['humidity']
wind = results['answer_box']['wind']

print(f'{loc}\n{weather_date}\n{weather}\n{temp}°F\n{precipitation}\n{humidity}\n{wind}\n')

--------
'''
Makabecho Furushiro, Sakuragawa, Ibaraki, Japan
Saturday
Cloudy
79°F
40%
81%
7 mph
'''

Disclaimer, I work for SerpApi.

score -1 · Answer 4 · edited Feb 15 '21 at 07:39

-1

I also had this problem. You should not import like this

from bs4 import BeautifulSoup

you should import like this

from bs4 import *

This should work.

edited Feb 15 '21 at 07:39

Sid Kwakkel

749
3
11
31

answered Feb 15 '21 at 07:01

Arson Basak

1

Your approach imports all the dependencies from `bs4` and might even import the dependencies which remain potentially unused. So, ideally, only the required dependencies from a module should be imported. – Abdul Mateen Feb 15 '21 at 07:43

BeautifulSoup "AttributeError: 'NoneType' object has no attribute 'text'"

4 Answers4

Linked

Related