0

I am trying to build a program to select at most 5 search result, search them in google and open it up in the browser. However, "soup.select('.r a')" in the program returns an empty list.

import requests
import sys
import webbrowser
import bs4
res=requests.get('http://google.com/search?q='+'Python'.join(sys.argv[1:]))
res.raise_for_status()
soup=bs4.BeautifulSoup(res.text,'html.parser')
linkElements=soup.select('.r a')
linkToOpen=min(5,len(linkElements))
for i in range(linkToOpen):
    webbrowser.open('https//google.com'+linkElements[i].get('href'))

The code runs without any error and without any output but does not open the browser with search results as it was supposed to do.

Pranjal Pathak
  • 28
  • 1
  • 1
  • 7
  • JavaScript changes the classes once the HTML loads. If you don't the HTML you download with requests, you'll most likely see that it doesn't match what you inspect in a browser. – facelessuser Jul 19 '19 at 00:35
  • I am really sorry but I am new to programming. I would really appreciate if you can explain a little more about it. And is there a way around? Is it possible to make it work? – Pranjal Pathak Jul 19 '19 at 00:50
  • This question has been asked before, you'll find the answer here: https://stackoverflow.com/questions/56664934/soup-select-r-a-in-fhttps-google-com-searchq-query-brings-back-empty – facelessuser Jul 19 '19 at 00:59
  • Thanks for sharing. I tried the solution suggested by Aravind but it does not seem to be working. It again returns an empty list! – Pranjal Pathak Jul 19 '19 at 03:34

1 Answers1

0

The page uses heavy Javascript, so what you get through request isn't exactly what you see in your browser. You could use this script to scrape search links from the page (it will search for all links where href begins with /url?q=):

import requests
import bs4

res=requests.get('http://google.com/search?q=Python')
res.raise_for_status()
soup=bs4.BeautifulSoup(res.text,'lxml')

for a in soup.select('a[href^="/url?q="]'):
    if 'accounts.google.com' in a['href']:
        continue
    print(a['href'])

Prints:

/url?q=https://www.python.org/&sa=U&ved=2ahUKEwiCivuzmMDjAhVtxMQBHWfIBxYQFjAAegQIBxAB&usg=AOvVaw3TQfZO4gqXrTLm27x1qkJF
/url?q=https://www.python.org/downloads/&sa=U&ved=2ahUKEwiCivuzmMDjAhVtxMQBHWfIBxYQjBAwAXoECAcQAw&usg=AOvVaw1ktQJcwOoHkm6N4OpYlgA-
/url?q=https://www.python.org/downloads/release/python-373/&sa=U&ved=2ahUKEwiCivuzmMDjAhVtxMQBHWfIBxYQjBAwAnoECAcQBQ&usg=AOvVaw1DkCjMJbFGfNpiQw1qDBWB
/url?q=https://www.python.org/about/gettingstarted/&sa=U&ved=2ahUKEwiCivuzmMDjAhVtxMQBHWfIBxYQjBAwA3oECAcQBw&usg=AOvVaw1ih35T-Enlb7d32gyyNGvc
...and so on.
Andrej Kesely
  • 168,389
  • 15
  • 48
  • 91