0

Please forgive me, as I have limited knowledge of scraperwiki and twitter mining.

I have the following code to scrape twitter data. However, I want to edit the code to only give me results that are geotagged for New York on a particular date (let's say, April 1, 2013). Do you know how I should do this?

###############################################################################
# Twitter srcaper for the term 'hello'.
###############################################################################

import scraperwiki
import simplejson

# retrieve a page
base_url = 'http://search.twitter.com/search.json?q='
q = 'hello'
options = '&rpp=10&page='
page = 1

while 1:
    try:
        url = base_url + q + options + str(page)
        html = scraperwiki.scrape(url)
        #print html
        soup = simplejson.loads(html)
        for result in soup['results']:
            data = {}
            data['id'] = result['id']
            data['text'] = result['text']
            data['from_user'] = result['from_user']
            data['created_at'] = result['created_at']
            # save records to the datastore
            scraperwiki.datastore.save(["id"], data)
        page = page + 1
    except:
        print str(page) + ' pages scraped'
        break

1 Answers1

0

In addition to q, use the query parameters geocode and until. See this page of the Twitter API documentation. Please note that you cannot use the Search API to find Tweets older than about a week.

Besides, it's easier to use urllib.urlencode() to construct your query, like for example

query_dict = {'q':'search term(s)', 'geocode':'37.781157,-122.398720,25mi', 'until':'2013-05-10'}
query = urllib.urlencode(query_dict)
response = urllib.urlopen(basic_url + query).read()

Update: Please see this example scraper that you can copy and adapt to your needs.

Suzana
  • 4,251
  • 2
  • 28
  • 52
  • Thank you so much! I'm so appreciative of your help. If it's not too much of a hassle, I would like to clarify one thing. I want to be able to feed this output into ArcGIS - with the geocode coordinates. Is there a way I can tweak this result with a printout of the coordinates? for result in result_json['results']: #print result scraperwiki.sqlite.save(unique_keys=['id'], data=result, table_name="Tweets") – user2368126 May 10 '13 at 01:53
  • You can add the coordinates you searched for just before you put the date into the SQLite table: `result['geocode'] = geocode` I've adjusted the scraper accordingly. – Suzana May 10 '13 at 02:26
  • So incredibly helpful. Thank you!! – user2368126 May 10 '13 at 10:37