15

How can I programmatically using the Google Python client library do an advanced search with Google custom search API search engine in order to return a list of first n links based in some terms and parameters of an advanced search I queried?.

I tried to check the documentation(I did not found any example), and this answer. However, the latter did not worked, since currently there is no support for the AJAX API. So far I tried this:

from googleapiclient.discovery import build
import pprint

my_cse_id = "test"

def google_search(search_term, api_key, cse_id, **kwargs):
    service = build("customsearch", "v1",developerKey="<My developer key>")
    res = service.cse().list(q=search_term, cx=cse_id, **kwargs).execute()
    return res['items']

results = google_search('dogs', my_api_key, my_cse_id, num=10)

for result in results:
    pprint.pprint(result)

And this:

import pprint

from googleapiclient.discovery import build


def main():
  service = build("customsearch", "v1",developerKey="<My developer key>")

  res = service.cse().list(q='dogs').execute()
  pprint.pprint(res)

if __name__ == '__main__':
  main()

Thus, any idea of how to do and advanced search with google's search engine API?. This is how my credentials look at google console:

credentials

Community
  • 1
  • 1
J.Do
  • 428
  • 1
  • 5
  • 18
  • What error do you get? – Eugene Lisitsky Dec 10 '16 at 19:17
  • @EugeneLisitsky, I did not got any error. The issue is that I do not understand how to make an [advanced search](https://www.google.ca/advanced_search) with google's API. For example, how can I programmatically query with google all the `urls` that contain `the best dog food` in `english` in the `UK`. – J.Do Dec 11 '16 at 19:53

3 Answers3

9

First you need to define a custom search as described here, then make sure your my_cse_id matches the google API custom search (cs) id, e.g.

cx='017576662512468239146:omuauf_lfve'

is a search engine which only searches for domains ending with .com.

Next we need our developerKey.

from googleapiclient.discovery import build
service = build("customsearch", "v1", developerKey=dev_key)

Now we can execute our search.

res = service.cse().list(q=search_term, cx=my_cse_id).execute()

We can add additional search parameters, like language or country by using the arguments described here, e.g.

res = service.cse().list(q="the best dog food", cx=my_cse_id, cr="countryUK", lr="lang_en").execute()

would serch for "the best dog food" in English and the site needs to be from the UK.


The following modified code worked for me. api_key was removed since it was never used.

from googleapiclient.discovery import build

my_cse_id = "012156694711735292392:rl7x1k3j0vy"
dev_key = "<Your developer key>"

def google_search(search_term, cse_id, **kwargs):
    service = build("customsearch", "v1", developerKey=dev_key)
    res = service.cse().list(q=search_term, cx=cse_id, **kwargs).execute()
    return res['items']

results = google_search('boxer dogs', my_cse_id, num=10, cr="countryCA", lr="lang_en")
for result in results:
    print(result.get('link'))

Output

http://www.aboxerworld.com/whiteboxerfaqs.htm
http://boxerrescueontario.com/?section=available_dogs
http://www.aboxerworld.com/abouttheboxerbreed.htm
http://m.huffpost.com/ca/entry/10992754
http://rawboxers.com/aboutraw.shtml
http://www.tanoakboxers.com/
http://www.mondlichtboxers.com/
http://www.tanoakboxers.com/puppies/
http://www.landosboxers.com/dogs/puppies/puppies.htm
http://www.boxerrescuequebec.com/
Maximilian Peters
  • 30,348
  • 12
  • 86
  • 99
  • Thanks for the help!. However, my question was about to making a [advanced search](https://www.google.ca/advanced_search) (i.e. to make a google query with specific phrases, words, region, domain, language, etc). My main objective is to programmatically do an advanced search. – J.Do Dec 11 '16 at 19:34
  • Also, what I do not understand is why your code sample just return CS lectures links instead of dogs links. Could you show us how to make an advanced search of all the urls of boxer dogs in Seattle in English language?. – J.Do Dec 11 '16 at 19:39
  • 1
    Thanks for the clarification! See the updated answer, boxer dogs in the Canada speaking English. – Maximilian Peters Dec 11 '16 at 19:50
  • Thanks, that's what I was looking to do. Now several questions arise from the above sample. Why when I set `num=90` I got: `HttpError: – J.Do Dec 11 '16 at 20:00
  • Also, what about the other parameters of the advanced search engine? (e.g. none of these words, any of these words, this exact word or phrase, language, site or domain). How can I declare them into `google_search()` object?. – J.Do Dec 11 '16 at 20:11
  • 2
    From the documentation: Valid values are integers between 1 and 10, inclusive. All the parameters are here: https://developers.google.com/custom-search/json-api/v1/reference/cse/list – Maximilian Peters Dec 11 '16 at 20:11
  • I see. My main objective is to make advanced search queries over google in order to recover some interesting links, and finally store them. Is this the accurate way to do this?. – J.Do Dec 11 '16 at 20:13
  • Hey. So `cx='017576662512468239146:omuauf_lfve'` searches just for .com domain. what should the cx be to search the entire web (.org, .uk etc) and not just .com? – Digvijay Sawant Oct 19 '18 at 22:33
  • @DigvijaySawant: You need to define your own custom search engine, save it and then use the created ID. – Maximilian Peters Oct 20 '18 at 07:16
  • @MaximilianPeters Is there a tutorial that I could read? I am completely new to this and still figuring out how to do it. – Digvijay Sawant Oct 22 '18 at 03:52
2

An alternative using the python requests library if you do not want to use the google discovery api:

import requests, pprint
q='italy'
api_key='AIzaSyCs.....................'

q = requests.get('https://content.googleapis.com/customsearch/v1', 
    params={ 'cx': '013027958806940070381:dazyknr8pvm', 'q': q, 'key': api_key} )
pprint.pprint(q.json())
shuckc
  • 2,766
  • 1
  • 22
  • 17
  • Thanx its work, but why nothing retrieved when we pass a query of multiple words, like: "valencia party" .. ? – Minions Feb 27 '18 at 11:04
1

This is late but hopefully it helps someone...

For advanced search use

response=service.cse().list(q="mysearchterm", 
cx="017576662512468239146:omuauf_lfve", ).execute()

The list() method takes in more args to help advance your search... check args here: https://developers.google.com/custom-search/json-api/v1/reference/cse/list

Martin
  • 22,212
  • 11
  • 70
  • 132
Allan Guwatudde
  • 533
  • 4
  • 8