-1

I have written the following code, which attempts to search Google using Beautiful Soup:

import requests
from bs4 import BeautifulSoup
 
def search_error(statement):
    print("Googling.......")
    google_search = requests.get("https://www.google.com/search?q=" + statement)
    soup = BeautifulSoup(google_search.text, 'html.parser')
    search_result = soup.select(".r a")

    for link in search_result:
        print(link)
if __name__ == '__main__':
    statement = input("Enter the Statement of Error to find it on Stack Overflow: ")
    search_error(statement)

However, the code is not returning the expected output. The search_result variable is equal to an empty list, whereas I expected it to contain all of the search results.

What is wrong with this code, and how should I modify it to obtain all the search results for the keyword statement?

Cody Gray - on strike
  • 239,200
  • 50
  • 490
  • 574

3 Answers3

0

The search_result is empty because you are passing text and not the content of the HTML to the parser(BeautifulSoup). Try sending the content,

soup = BeautifulSoup(google_search.content, 'html.parser')
search_result = soup.find_all('a',{"data-uch":1})

That should probably do the trick.

BDL
  • 21,052
  • 22
  • 49
  • 55
  • Check what the soup variable is returning. Also you can check what the status code for the google_search get request is. Normally the request.get should fetch you a 200,Ok! – Nikhil Raikar Jul 27 '20 at 08:16
  • can you please give me your email address or instagram or FB I can send you screenshot and stuff. Need support. – Shubh Dholakiya Jul 27 '20 at 08:31
0

Please print the soup variable and attach the output. Because when I run your code, and printing soup variable. it shows -

Our systems have detected unusual traffic from your computer network.  This page checks to see if it's really you sending the requests, and not a robot.

maybe that's why you're getting any result using soup.select(".r a") .

Attach the output after printing soup variable's value so we can see what you're getting .

Indrajeet Singh
  • 470
  • 1
  • 6
  • 21
0

It's because:

  1. no user-agent is specified thus Google will block a request eventually. What is my user-agent
  2. no attribute specified while extracting links in a for loop, e.g ['href'] attribute.

Code:

import requests, lxml
from bs4 import BeautifulSoup
 
def search_error(statement):

    headers = {
        'User-agent':
        "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582"
    }

    soup = BeautifulSoup(requests.get(f"https://www.google.com/search?q={statement}", headers=headers).text, 'lxml')
    search_result = soup.select(".yuRUbf")

    for link in search_result:
        print(link.a['href'])

if __name__ == '__main__':
    statement = input("Enter the Statement of Error to find it on Stack Overflow: ")
    search_error(statement)

-------------
'''
Enter the Statement of Error to find it on Stack Overflow: regex match subdomain

https://stackoverflow.com/questions/7930751/regexp-for-subdomain/7933253
https://stackoverflow.com/questions/19272892/regex-to-match-all-subdomains-of-a-matched-domains
https://askubuntu.com/questions/1158962/grep-and-regex-filter-subdomains-in-a-file
...
'''

Alternatively, you can do the same thing by using Google Organic Results API from SerpApi. It's a paid API with a free plan.

The difference in your case is that you would only need to iterate over structured JSON and get the data you want, rather than figuring out how to extract things and understand why Google isn't returning what you were looking for.

Code to integrate:

from serpapi import GoogleSearch

def search_error(statement):
    params = {
      "api_key": "YOUR_API_KEY",
      "engine": "google",
      "q": statement,
      "hl": "en"
    }

    search = GoogleSearch(params)
    results = search.get_dict()

    for result in results['organic_results']:
      print(result['link'])
      

if __name__ == '__main__':
    statement = input("Enter the Statement of Error to find it on Stack Overflow: ")
    search_error(statement)

-------------
'''
Enter the Statement of Error to find it on Stack Overflow: regex match subdomain

https://stackoverflow.com/questions/7930751/regexp-for-subdomain
https://stackoverflow.com/questions/8959765/need-regex-to-get-domain-subdomain/8959842
https://askubuntu.com/questions/1158962/grep-and-regex-filter-subdomains-in-a-file
...
'''

P.S - I wrote a bit more detailed blog post about how to scrape Google Organic Search.

Disclaimer, I work for SerpApi.

Dmitriy Zub
  • 1,398
  • 8
  • 35