I am using the Twitter Streaming API to get tweets matching certain keyword. The output which is obtained is written to a file. I do some basic comparison based on distance from which the tweet originates and I write to seperate files accordingly.
lat2=float(d['geo']['coordinates'][0])
long2=float(d['geo']['coordinates'][1])
lat1=venue_latitude
long1=venue_longitude
lon1, lat1, lon2, lat2 = map(radians, [long1, lat1, long2, lat2])
dlon = lon2 - lon1
dlat = lat2 - lat1
a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2
c = 2 * asin(sqrt(a))
distance = 6367 * c * 0.621371
if distance < 1:
user= d['user']['screen_name']
user_id=d['user']['id']
file=open('Tweets_within_one_mile.txt','a')
users.append(user)
text=str(user) + str(user_id)+ "qwertyasdfgzxcvb" + str(distance) + d['text']
u = text.encode('utf-8')
file.write(u)
file.close()
if distance > 2 and distance < 60:
user= d['user']['screen_name']
user_id=d['user']['id']
file=open('Tweets_within_sixty_miles.txt','a')
users.append(user)
text=str(user) + str(user_id) + str(co_lon2) +d['text']
u = text.encode('utf-8')
file.write(u)
file.close()
When I ran the script the last time. The number of tweets collected was aroung 30,000. But only 20,000 tweets were completly written to the file. The remaining 10,000 were writeen incomplete.
Is there a problem with the Python output buffer?