Python code to search Google returns an empty list of results

Question

I have written the following code, which attempts to search Google using Beautiful Soup:

import requests
from bs4 import BeautifulSoup
 
def search_error(statement):
    print("Googling.......")
    google_search = requests.get("https://www.google.com/search?q=" + statement)
    soup = BeautifulSoup(google_search.text, 'html.parser')
    search_result = soup.select(".r a")

    for link in search_result:
        print(link)
if __name__ == '__main__':
    statement = input("Enter the Statement of Error to find it on Stack Overflow: ")
    search_error(statement)

However, the code is not returning the expected output. The search_result variable is equal to an empty list, whereas I expected it to contain all of the search results.

What is wrong with this code, and how should I modify it to obtain all the search results for the keyword statement?

score 0 · Answer 1 · edited Jul 31 '20 at 08:16

0

The search_result is empty because you are passing text and not the content of the HTML to the parser(BeautifulSoup). Try sending the content,

soup = BeautifulSoup(google_search.content, 'html.parser')
search_result = soup.find_all('a',{"data-uch":1})

That should probably do the trick.

edited Jul 31 '20 at 08:16

BDL

21,052
22
49
55

answered Jul 27 '20 at 07:53

Nikhil Raikar

43
6

Check what the soup variable is returning. Also you can check what the status code for the google_search get request is. Normally the request.get should fetch you a 200,Ok! – Nikhil Raikar Jul 27 '20 at 08:16
can you please give me your email address or instagram or FB I can send you screenshot and stuff. Need support. – Shubh Dholakiya Jul 27 '20 at 08:31

score 0 · Answer 2 · answered Jul 27 '20 at 15:02

0

Please print the soup variable and attach the output. Because when I run your code, and printing soup variable. it shows -

Our systems have detected unusual traffic from your computer network.  This page checks to see if it's really you sending the requests, and not a robot.

maybe that's why you're getting any result using soup.select(".r a") .

Attach the output after printing soup variable's value so we can see what you're getting .

answered Jul 27 '20 at 15:02

Indrajeet Singh

470
1
6
21

1

When I write KCrYT instead of .r it gives right answer! – Shubh Dholakiya Jul 28 '20 at 06:48
Okay Shubh. It means you've already solved this question. So answer it below and close this one. – Indrajeet Singh Jul 28 '20 at 06:54
but sir i am not able to understand one thing – Shubh Dholakiya Jul 28 '20 at 07:07
In Google when I inspect the page It shows the class 'r' and when I just do soup.prettyify() it shows the class kCrYT so why this Google page shows other? When I put '.r a' it doesn't give any output. Then why Google shows 'r' class – Shubh Dholakiya Jul 28 '20 at 07:08
I am also not sure, I have to look into it. Maybe the result showed in inspect windows is somewhat different when we use request method. Again i will look into that – Indrajeet Singh Jul 28 '20 at 07:23
Yes sure please sir and please even explain me that too – Shubh Dholakiya Jul 28 '20 at 07:40

Dmitriy Zub · Answer 3 · 2021-08-28T08:09:04.780

It's because:

no user-agent is specified thus Google will block a request eventually. What is my user-agent
no attribute specified while extracting links in a for loop, e.g ['href'] attribute.

Code:

import requests, lxml
from bs4 import BeautifulSoup
 
def search_error(statement):

    headers = {
        'User-agent':
        "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582"
    }

    soup = BeautifulSoup(requests.get(f"https://www.google.com/search?q={statement}", headers=headers).text, 'lxml')
    search_result = soup.select(".yuRUbf")

    for link in search_result:
        print(link.a['href'])

if __name__ == '__main__':
    statement = input("Enter the Statement of Error to find it on Stack Overflow: ")
    search_error(statement)

-------------
'''
Enter the Statement of Error to find it on Stack Overflow: regex match subdomain

https://stackoverflow.com/questions/7930751/regexp-for-subdomain/7933253
https://stackoverflow.com/questions/19272892/regex-to-match-all-subdomains-of-a-matched-domains
https://askubuntu.com/questions/1158962/grep-and-regex-filter-subdomains-in-a-file
...
'''

Alternatively, you can do the same thing by using Google Organic Results API from SerpApi. It's a paid API with a free plan.

The difference in your case is that you would only need to iterate over structured JSON and get the data you want, rather than figuring out how to extract things and understand why Google isn't returning what you were looking for.

Code to integrate:

from serpapi import GoogleSearch

def search_error(statement):
    params = {
      "api_key": "YOUR_API_KEY",
      "engine": "google",
      "q": statement,
      "hl": "en"
    }

    search = GoogleSearch(params)
    results = search.get_dict()

    for result in results['organic_results']:
      print(result['link'])
      

if __name__ == '__main__':
    statement = input("Enter the Statement of Error to find it on Stack Overflow: ")
    search_error(statement)

-------------
'''
Enter the Statement of Error to find it on Stack Overflow: regex match subdomain

https://stackoverflow.com/questions/7930751/regexp-for-subdomain
https://stackoverflow.com/questions/8959765/need-regex-to-get-domain-subdomain/8959842
https://askubuntu.com/questions/1158962/grep-and-regex-filter-subdomains-in-a-file
...
'''

P.S - I wrote a bit more detailed blog post about how to scrape Google Organic Search.

Disclaimer, I work for SerpApi.

Python code to search Google returns an empty list of results

3 Answers3