1

I am trying to download tweets from the Reuters (@reuters) twitter account for the month of November 2019.

I am using tweepy on python and this is my code:

pip install tweepy
import tweepy as tw

#Keys
consumer_key = "..."
consumer_secret = "..."
access_token = "..."
access_token_secret = "..."

# Login
auth = tw.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tw.API(auth, wait_on_rate_limit=True)

#Get user's tweets
tweets = tw.Cursor(api.user_timeline,
                   id="reuters",
                   lang="en",
                   since="2019-11-01",
                   until="2019-11-30").items()

all_tweets = [tweet.text for tweet in tweets]

all_tweets[:100]

The "until" parameter does not seem to be working because the tweets that my code pulls include latest tweets.

absk
  • 53
  • 1
  • 9

4 Answers4

0

The tweepy library only supports Twitter's older standard search API at this time, and the standard search only covers 7 days of history. In order to search as far back as November 2019, you would need to use either the premium full-archive search API, or the enterprise full-archive search. These APIs are both commercial, but the premium API has a free tier called "sandbox" that would also work. In Python, you could use the search-tweets library.

The timeline method mentioned in the other answer would also be an option, but it would depend on Tweets from November being within the scope of the timeline API, which supports up to 3200 Tweets back from today.

Andy Piper
  • 11,422
  • 2
  • 26
  • 49
0

Below are two simple ways we can extract the tweets for specific duration and for specific user. Solution 1: using TwitterAPI. As mentioned by andy_piper you need premium or sandbox access, premium account is too expensive. Until you are not extracting huge corpus from twitter, it’s more than enough to have sandbox account which is free. You can simply enable sandbox account Using https://developer.twitter.com/en/pricing/aaa-all which will give you access to archive with limited number of request.

create dev environment label linking to your twitter account: go to dev environment in your twitter account and create corresponding label for sandbox. once you configured labels. Below code will extract corresponding tweets.(change maxResults correspondingly)

from TwitterAPI import TwitterAPI
Product = 'fullarchive'
label = 'Dev'
api = TwitterAPI(consumer_key, consumer_secret, access_token, access_token_secret)
tweets = api.request('tweets/search/%s/:%s' % (Product, label),
{'query' : 'from:reuters', 'maxResults': '10', 'fromDate':'201911010000', 'toDate':'201911300000'}) 

for tweet in tweets:
  print(tweet['id'])

Solution 2 : using GetOldTweet3 api, I won’t prefer this way since not sure about the licence, but it work like charm without even twitter developer account but bit suspicious with the privacy policy of twitter, here’s the code anyway.

import GetOldTweets3 as got
username = 'reuters'
count = 100
tweetCriteria = got.manager.TweetCriteria().setUsername(username)\
                                    .setMaxTweets(count).setSince("2019-11-01")\
                                       .setUntil("2019-11-30")\
tweets = got.manager.TweetManager.getTweets(tweetCriteria)
for tweet in tweets:
  print(tweet.id,tweet.author_id,tweet.date)

Reference: https://pypi.org/project/GetOldTweets3/ https://github.com/geduldig/TwitterAPI/blob/master/examples/premium_search.py

premkumar
  • 26
  • 3
  • the latter is against the Terms of Service of Twitter, so you would be better off using the official API, or your IP address is likely to be blocked. – Andy Piper May 11 '20 at 21:42
  • thanks, this worked! sadly it only allows 5K tweets to be pulled per month but it's better than nothing... – absk May 13 '20 at 06:14
0

I have the answer. You cannot do this without going premium.

absk
  • 53
  • 1
  • 9
0
import tweepy
import csv
import pandas as pd
####input your credentials here
consumer_key = ''
consumer_secret = ''
access_token = ''
access_token_secret = ''

auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth,wait_on_rate_limit=True)

# Open/Create a file to append data
csvFile = open('tweets.csv', 'a')
#Use csv Writer
csvWriter = csv.writer(csvFile)

# tracklist = ["Womens Day", "internationalwomensday", "internationalwomensday2021", "internationalwomensday21","women's day", "international women's day", "IWD", "womensday", "WomensDay", "HappyInternationalWomensDay","Happy Women's Day", "HappyWomensDay", "happywomensday", "happyinternationalwomensday", "Women", "women"]
# tracklist = ''.join(str(e) for e in tracklist)
# import pdb; pdb.set_trace()
count = 0

# for tweet in tweepy.Cursor(api.search,q="Womens Day OR internationalwomensday OR internationalwomensday2021 OR internationalwomensday21 OR women's day OR international women's day OR IWD or womensday OR WomensDay OR HappyInternationalWomensDay OR Happy Women's Day OR HappyWomensDay OR happywomensday OR happyinternationalwomensday OR Women OR women",count=10000,
#                            lang="en",
#                            since="2021-03-06", 
#                            include_rts=False).items():
#     print (tweet.created_at, tweet.text)
#     csvWriter.writerow([tweet.created_at, tweet.text.encode('utf-8')])


for tweet in tweepy.Cursor(api.search,q="Womens Day OR internationalwomensday OR internationalwomensday2021 OR internationalwomensday21 OR women's day OR international women's day OR IWD OR HappyInternationalWomensDay OR Happy Women's Day OR HappyWomensDay OR happywomensday OR happyinternationalwomensday",
                           count=100000,
                           include_rts=False,
                           country_code=True,
                           coordinates=True,
                           lang="en",
                           since="2021-03-06",
                           until="2021-03-10"
                           ).items():
    print (tweet.created_at, tweet.text)
    csvWriter.writerow([tweet.created_at, tweet.text.encode('utf-8')])
graj499
  • 87
  • 2
  • 12