0

I am using the Twitter Streaming API to get tweets matching certain keyword. The output which is obtained is written to a file. I do some basic comparison based on distance from which the tweet originates and I write to seperate files accordingly.

        lat2=float(d['geo']['coordinates'][0])
        long2=float(d['geo']['coordinates'][1])
        lat1=venue_latitude
        long1=venue_longitude
        lon1, lat1, lon2, lat2 = map(radians, [long1, lat1, long2, lat2])
        dlon = lon2 - lon1 
        dlat = lat2 - lat1 
        a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2
        c = 2 * asin(sqrt(a)) 
        distance = 6367 * c * 0.621371
        if distance < 1:
            user= d['user']['screen_name']
            user_id=d['user']['id']
            file=open('Tweets_within_one_mile.txt','a')
            users.append(user)
            text=str(user)  + str(user_id)+ "qwertyasdfgzxcvb" + str(distance) + d['text'] 
            u = text.encode('utf-8')
            file.write(u)
            file.close()
        if distance > 2 and distance < 60:
            user= d['user']['screen_name']
            user_id=d['user']['id']
            file=open('Tweets_within_sixty_miles.txt','a')
            users.append(user)
            text=str(user)  + str(user_id) + str(co_lon2) +d['text']
            u = text.encode('utf-8')
            file.write(u)
            file.close()

When I ran the script the last time. The number of tweets collected was aroung 30,000. But only 20,000 tweets were completly written to the file. The remaining 10,000 were writeen incomplete.

Is there a problem with the Python output buffer?

jamylak
  • 128,818
  • 30
  • 231
  • 230
shivram
  • 469
  • 2
  • 10
  • 26

1 Answers1

0

I can't be certain that's the exact reason why only part of them saved, but the way you constructed your if statements might have something to do with it.

For example:

if distance < 1:
    print("foo")

if distance  > 2 or distance a < 60:
    print("bar")
  1. That code works for values less, but NOT equal to 1.
  2. Numbers between 1 and 2 won't work (i.e., 1.1 to 1.9)
  3. Numbers equal to 2 or 60
  4. Numbers greater than 60

Again, not sure exact values you're trying to obtain, but maybe this might help (it will capture all the data):

if distance <= 1:
    print("foo")

if 2 <=distance or distance <= 60:
    print("bar")
Leb
  • 15,483
  • 10
  • 56
  • 75