It's because:
- no
user-agent
is specified thus Google will block a request eventually. What is my user-agent
- no attribute specified while extracting links in a
for
loop, e.g ['href']
attribute.
Code:
import requests, lxml
from bs4 import BeautifulSoup
def search_error(statement):
headers = {
'User-agent':
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582"
}
soup = BeautifulSoup(requests.get(f"https://www.google.com/search?q={statement}", headers=headers).text, 'lxml')
search_result = soup.select(".yuRUbf")
for link in search_result:
print(link.a['href'])
if __name__ == '__main__':
statement = input("Enter the Statement of Error to find it on Stack Overflow: ")
search_error(statement)
-------------
'''
Enter the Statement of Error to find it on Stack Overflow: regex match subdomain
https://stackoverflow.com/questions/7930751/regexp-for-subdomain/7933253
https://stackoverflow.com/questions/19272892/regex-to-match-all-subdomains-of-a-matched-domains
https://askubuntu.com/questions/1158962/grep-and-regex-filter-subdomains-in-a-file
...
'''
Alternatively, you can do the same thing by using Google Organic Results API from SerpApi. It's a paid API with a free plan.
The difference in your case is that you would only need to iterate over structured JSON and get the data you want, rather than figuring out how to extract things and understand why Google isn't returning what you were looking for.
Code to integrate:
from serpapi import GoogleSearch
def search_error(statement):
params = {
"api_key": "YOUR_API_KEY",
"engine": "google",
"q": statement,
"hl": "en"
}
search = GoogleSearch(params)
results = search.get_dict()
for result in results['organic_results']:
print(result['link'])
if __name__ == '__main__':
statement = input("Enter the Statement of Error to find it on Stack Overflow: ")
search_error(statement)
-------------
'''
Enter the Statement of Error to find it on Stack Overflow: regex match subdomain
https://stackoverflow.com/questions/7930751/regexp-for-subdomain
https://stackoverflow.com/questions/8959765/need-regex-to-get-domain-subdomain/8959842
https://askubuntu.com/questions/1158962/grep-and-regex-filter-subdomains-in-a-file
...
'''
P.S - I wrote a bit more detailed blog post about how to scrape Google Organic Search.
Disclaimer, I work for SerpApi.