1

I am having a little trouble trying to get this performed for output after searching. It did not happen anything such as opening web browsers. Am I doing something wrong? Your tips and advice would be so much appreciated. Here's the code I entered from the above book:

#! /usr/bin/env python3
# searchpypi.py  - Opens several search results.

import requests, sys, webbrowser, bs4
print('Searching...')    # display text while downloading the search result page
res = requests.get('https://www.duckduckgo.com/search?q='+''.join(sys.argv[1:]))
res.raise_for_status()




# TODO: Retrieve top search result links.


soup = bs4.BeautifulSoup(res.text, 'html.parser')


# TODO: Open a browser tab for each result.


linkElems = soup.select('.package-snippet')
for elem in linkElems[:5]:   # first 5 elements in list
    urlToOpen = 'https://pypi.org' + elem.get('href')
    print('Opening', urlToOpen)
    webbrowser.open(urlToOpen)

Makobak232
  • 21
  • 3
  • often the search engines will see that your coming at it with something it thinks of as a bot. and will refuse to answer. check if the search engine has a means of calling it that is supported. – LhasaDad Mar 08 '20 at 21:58

3 Answers3

0

Okay so a couple of things here

often the search engines will see that your coming at it with something it thinks of as a bot. and will refuse to answer. check if the search engine has a means of calling it that is supported and add a user-agent to the header you are using during you r http requests

there is a better way to handle query strings using the requests library rather than concatenation

So the request portion of your code should look something like

import requests, sys, webbrowser 
from bs4 import BeautifulSoup      #save yourself on some unnecessary typing(and possible error) down the line  

print('Searching...')    # display text while downloading the search result page

parmas = {'q':sys.argv[1:]}
headers = {'user-agent':'Mozilla/5.0 (Linux; rv:1.0)'}

res = requests.get('https://www.duckduckgo.com/search, params=params, headers=headers)
res.raise_for_status()

Finally while working with BeautifulSoup your should consider using the lxml parser instead of html.parser cause it is generally faster and you would want that while crawling a page

maestro.inc
  • 796
  • 1
  • 6
  • 13
0

The code does not work when run exactly as written in the book (2nd edition).

The one line you need to change is written exactly as follows in the book:

res = requests.get('https://google.com/search?q=' 'https://pypi.org/search/?q=' + ' '.join(sys.argv[1:]))

To get it to work, just change the code to the following:

res = requests.get('https://pypi.org/search/?q=' + ' '.join(sys.argv[1:]))
-1

Try changing the requests.get line to the following:

res = requests.get('http://pypi.org/search/?q=' + ' '.join(sys.argv[1:]))

Run the following from the command line (assumes file name is Chapter 12_searchpypi.py):

python "Chapter 12_searchpypi.py" "boring stuff"
Gino Mempin
  • 25,369
  • 29
  • 96
  • 135