0

I've been experimenting with the requests and the bs4 module for a couple of days now. I wanted to make a simple program similar to the 'I'm Feeling Lucky' from google.

Here's my code:

import requests, bs4, webbrowser

source=requests.get('https://www.google.com/search?q=facebook').text

exsoup=bs4.BeautifulSoup(source, 'lxml')
# <cite class="iUh30">https://www.facebook.com/</cite>
match=exsoup.find('cite', class_='iUh30')

print(match.text)

But when I run this I get the following error:

    print(match.text)
AttributeError: 'NoneType' object has no attribute 'text'

How can I make this work?

Shell1500
  • 330
  • 1
  • 5
  • 14
  • 2
    There doesn't seem to be any hits matching that search term when I run your code. You're getting None as the return because it isn't finding any matches for `('cite', class_='iUh30')` – Andrew McDowell Nov 05 '18 at 17:00
  • @AndrewMcDowell i tried doing this by first downloading the html source for the same page and using open() to get the file and then using it as the source, it worked just fine. I dont know why this isnt working. – Shell1500 Nov 05 '18 at 17:16
  • I tested this and got the same results. The source content seems to be different from viewing the site in a browser to using the requests library. The cite tag only has the iUh30 class when I view in browser. I'd guess google are dynamically displaying different code depending on the method of viewing. – Andrew McDowell Nov 05 '18 at 17:32

2 Answers2

1

try to iterate on something like this, excluding class_ attribute:

match=exsoup.find_all('cite')

for i in match:
    if 'http' in i.text:
        print(i.text)
Dmitriy Fialkovskiy
  • 3,065
  • 8
  • 32
  • 47
  • Woah! thanks a lot! It worked. Just a thing though,what would i do if i wanted to get a list of all cite elements of a certain class? – Shell1500 Nov 05 '18 at 17:29
  • when you send request to site by default you send a non-ordinary user-agent, 'User-Agent': 'python-requests/1.2.0' check out here: http://docs.python-requests.org/en/master/user/advanced/. I think google responds to different user-agents with different (or maybe dynamically generated) `classes` of `cite` tags. – Dmitriy Fialkovskiy Nov 05 '18 at 17:34
0

The issue seems to be that you are getting different results from visiting the site with a browser from when you visit using the requests library. You could try specifying a header (I took this example from the following: https://stackoverflow.com/a/27652558/9742036)

headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}

source = requests.get('https://www.google.com/search?q=facebook', headers=headers).text

and the source code should look more like your browser visit.

Otherwise, your code works fine. You're just getting no results in the original hit, so should code to handle that case (by for example using the iterator suggestion in the other answer.)

Andrew McDowell
  • 2,860
  • 1
  • 17
  • 31