0
#!/usr/bin/python
from TwitterSearch import *

import sys
import csv

tso = TwitterSearchOrder() # create a TwitterSearchOrder object
tso.set_keywords(['gmo']) # let's define all words we would like to have a look for
tso.set_language('en') # we want to see English tweets only
tso.set_include_entities(False) # and don't give us all those entity information

max_range = 1           # search range in kilometres
num_results = 500       # minimum results to obtain
outfile = "output.csv"


# create twitter API object
twitter = TwitterSearch(
                        access_token = "764537836884242432-GzJmUSL4hcC2DOJD71TiQXwCA0aGosz",
                        access_token_secret = "zDGYDeigRqDkmdqTgBOltcfNcNnfLwRZPkPLlnFyY3xqQ",
                        consumer_key = "Kr9ThiJWvPa1uTXZoj4O0YaSG",
                        consumer_secret = "ozGCkXtTCyCdOcL7ZFO4PJs85IaijjEuhl6iIdZU0AdH9CCoxS"
                        )

# Create an array of USA states
ustates = [
           "AL",
           "AK",
           "AS",
           "AZ",
           "AR",
           "CA",
           "CO",
           "CT",
           "DE",
           "DC",
           "FM",
           "FL",
           "GA",
           "GU",
           "HI",
           "ID",
           "IL",
           "IN",
           "IA",
           "KS",
           "KY",
           "LA",
           "ME",
           "MH",
           "MD",
           "MA",
           "MI",
           "MN",
           "MS",
           "MO",
           "MT",
           "NE",
           "NV",
           "NH",
           "NJ",
           "NM",
           "NY",
           "NC",
           "ND",
           "MP",
           "OH",
           "OK",
           "OR",
           "PW",
           "PA",
           "PR",
           "RI",
           "SC",
           "SD",
           "TN",
           "TX",
           "UT",
           "VT",
           "VI",
           "VA",
           "WA",
           "WV",
           "WI",
           "WY",
           "USA"
           ]

def linearSearch(item, obj, start=0):
    for i in range(start, len(obj)):
        if item == obj[i]:
            return True
    return False
# open a file to write (mode "w"), and create a CSV writer object
csvfile = file(outfile, "w")
csvwriter = csv.writer(csvfile)

# add headings to our CSV file
row = [ "user", "text", "place"]
csvwriter.writerow(row)

#-----------------------------------------------------------------------
# the twitter API only allows us to query up to 100 tweets at a time.
# to search for more, we will break our search up into 10 "pages", each
# of which will include 100 matching tweets.
#-----------------------------------------------------------------------
result_count = 0
last_id = None

while result_count <  num_results:
    # perform a search based on latitude and longitude
    # twitter API docs: https://dev.twitter.com/docs/api/1/get/search
    query = twitter.search_tweets_iterable(tso)

    for result in query:
        state = 0
        if result["place"]:
            user = result["user"]["screen_name"]
            text = result["text"]
            text = text.encode('utf-8', 'replace')
            place = result["place"]["full_name"]
            state = place.split(",")[1]
        if linearSearch(state,ustates):
            print state
            # now write this row to our CSV file
            row = [ user, text, place ]
            csvwriter.writerow(row)
            result_count += 1
        last_id = result["id"]

    print "got %d results" % result_count

csvfile.close()

I am trying to categorize the tweets by my array ustates, but the second if block seems like it doesn't work. I had no idea about that. What I did was to do a linear search, if my item is equal to the item in my array, I will write it into a csv file.

  • I don't see the point of define a linear search that do exactly the same as `state in ustates`. Also, are you sure that `state` is a uppercase string? try with `state.upper() in ustates` – Copperfield Sep 04 '16 at 23:45
  • I took your advices and looked back to my code. The first problem is this place.split may have only one data, it won't have place.split(",")[1]. – FattGuy Sep 10 '16 at 23:02
  • I used you advice to compare the state to ustates and figured out I have to change my data type to unicode in my array. But still got o result. – FattGuy Sep 10 '16 at 23:07
  • well, I don't know what the problem may be, but do basic debuging stuff, put a bunch of print everywhere using function like `repr` or `type`, maybe the type are incompatibles or the string have spaces in it (which you can see using repr) – Copperfield Sep 11 '16 at 00:28
  • @Copperfield I tested out I had a extra space after the comma – FattGuy Sep 11 '16 at 02:24

1 Answers1

0

as it looks like the problem is some whitespaces remaining, you can use .strip() to remove them

>>> x=" WY "
>>> x.strip()
'WY'
>>> 

Also some other tips

  1. To speed up the membership test in ustates use a set instead of a list because set have a constant time check, while list is a linear search

  2. The preferred way to open a file is using a context manager which ensure the closing of the file at the end of the block or in case of error in the block. Also use open instead of file

with those tip the code should look like

#!/usr/bin/python

... # all the previous stuff

# Create an set of USA states
ustates = {  
           "AL", "AK", "AS", "AZ", "AR",
           "CA", "CO", "CT",
           "DE", "DC",
           "FM", "FL",
           "GA", "GU",
           "HI",
           "ID", "IL", "IN", "IA",
           "KS", "KY",
           "LA",
           "ME", "MH", "MD", "MA", "MI", "MN", "MS", "MO", "MT", "MP",
           "NE", "NV", "NH", "NJ", "NM", "NY", "NC", "ND",
           "OH", "OK", "OR",
           "PW", "PA", "PR",
           "RI",
           "SC", "SD",
           "TN", "TX",
           "UT",
           "VT", "VI", "VA",
           "WA", "WV", "WI", "WY",
           "USA"
           } # that arrange is just to take less lines, while grouping them alphabetically 


# open a file to write (mode "w"), and create a CSV writer object
with open(outfile,"w") as csvfile:
    ...    # the rest is the same

    while result_count <  num_results:
        # perform a search based on latitude and longitude
        # twitter API docs: https://dev.twitter.com/docs/api/1/get/search
        query = twitter.search_tweets_iterable(tso)

        for result in query:
            state = 0
            if result["place"]:
                ... # all the other stuff
                state = state.strip()     #<--- the strip part, add the .upper() if needed or just in case
            if state in ustates:
                ... # all the other stuff
            ... # the rest of stuff

        print "got %d results" % result_count
Copperfield
  • 8,131
  • 3
  • 23
  • 29