1

I'm using the following code to extract links from google search to use for getting text that includes the keyword.

# -*- coding: utf-8 -*- 


import json
import urllib.request, urllib.parse

def showsome(searchfor,rzs,start,lang):
  query = urllib.parse.urlencode({'q': searchfor})
  url = 'http://ajax.googleapis.com/ajax/services/search/web?v=1.0&{0}&q=gates&rsz={1}&start={2}&hl={3}'.format(query,rzs,start,lang)
  search_response = urllib.request.urlopen(url)
  search_results = search_response.read().decode("utf8")
  results = json.loads(search_results)

  data = results['responseData']
  print(data)

  hits = data['results']
  print(hits)
  #print('Top %d hits:' % len(hits))
  listofLinks = []  
  for h in hits: 
      #print(' ', h['url'])
      listofLinks.append(h['url'])
  return(listofLinks)

showsome('manger','1','4','fr')

However, in intervals I'm getting the following error:

Traceback (most recent call last):

File "C:\Python33\code\htmlDraft.py", line 27, in

print(showsome('manger','4','1','fr'))

File "C:\Python33\code\htmlDraft.py", line 17, in showsome

hits = data['results']

TypeError: 'NoneType' object is not subscriptable

Which means something to the effect that he doesn't receive data. Is that because google is blocking me? I thought I was using their ajax API.

UrbKr
  • 621
  • 2
  • 11
  • 27
  • It's working fine for me (I did have to change the initial import to `import urllib` and then called the `urllib.urlopen` and `urllib.urlencode` functions directly because I'm using a different version of python but it shouldn't affect you) – yuvi Oct 24 '13 at 18:29
  • The problem is, that after doing it a couple of times the program gets nothing from Google for a minute or so, at least for me. I presume that it's because google doesn't want me scraping, which is a shame since my whole concept of what I wanted to do with the sites(nothing shady, just get some text to use for word context) goes down the drain. Which is a shame :( – UrbKr Oct 24 '13 at 18:36
  • 1
    You're right. I ran it a few times and eventually it stopped working. It actually makes sense, it prevents other search sites from just sort of riding on Google's back. perhaps this will be helpful: http://google-scraper.squabbel.com/ – yuvi Oct 24 '13 at 18:41
  • 1
    also: http://stackoverflow.com/questions/5321434/python-easy-way-to-scrape-google-download-top-n-hits-entire-html-documents – yuvi Oct 24 '13 at 18:45
  • Thank you, I will check those out. – UrbKr Oct 24 '13 at 18:47

0 Answers0