0

I am trying to scrape tweets of German politicians and parties. I used get_timeline() to loop through a vector of handles, like this:

for (user in afd_functional_users) {

  # get user timeline tweets
  user_tweets <- get_timeline(user, n = Inf, parse = TRUE, include_rts = FALSE,
                              exclude_replies = TRUE, trim_user = TRUE,
                              since_id = "1211587346791063552",
                              max_id = "1609503765043855360",
                              retryonratelimit = FALSE,
                              verbose = TRUE)
  
  # add the user handle as a column to the dataframe
  user_tweets$handle <- user
  
  # append the user's tweets to the main dataframe
  afd_df <- rbind(afd_df, user_tweets)
  
  # pause briefly to avoid rate limit errors
  Sys.sleep(5)
}

(the tweet IDs are tweets from my personal account that were posted on the days that I was interested in as a start/end date, which is 1 January 2020 to 31 December 2022).

This worked decently well for most users, however for some of them it did not actually scrape all the tweets in the timeframe, but cut off at a random point that was different for every user concerned. The number of tweets actually is also different for each user, for example for the user @AfD it collects 1003 tweets and for the user @AfDimBundestag it returns 2718 tweets. However, in both cases it cuts off before the actual date of 1 January 2020, in these cases in the middle of 2020/2021 respectively (starting from the newest tweets). I have had this problem with about 10-20% of all accounts I have collected tweets from, for the rest everything is fine.

Obviously the main solution that I tried was to use get_timeline() again for the concerned users, adjusting the timeframe to actually end with the ID of the last scraped tweet instead of the generic end date tweet. I tried this both in a loop format and by using singular users. Here is an example for the @AfD account:

user_tweets <- get_timeline("AfD", n = Inf, parse = TRUE, include_rts = FALSE,
                            exclude_replies = TRUE, trim_user = TRUE,
                            since_id = "1211587346791063552",
                            max_id = "1321067464114032642",
                            retryonratelimit = TRUE,
                            verbose = TRUE)

The code runs fine, but checking user_tweets returns 0 observations. I know for a fact that the accounts have tweeted in the timeframe, and have checked that manually via the Twitter search.

I have also tried using a version of search_tweets() and search_tweets2() to get the tweets of the concerned users, but this has not worked either.

Does anyone has any solution to this problem? I know unfortunately some things are not working anymore due to the old Twitter API being shut down, but I hope maybe something can be found. Let me know if you need more info.

Jule
  • 1
  • Have you checked that the tweets are is within the ids limits? Get those tweets and check the dates between them. Have you checked that you didn't run out of your monthly API limit? I don't know which authentication mechanism you use but that can also work against retrieving all the data. – llrs Apr 12 '23 at 13:29
  • @llrs Thanks for your reply! I'm fairly certain that the ID limits are correct, as the cutoff point is different for each of the problematic accounts. For the API limit, for some reason Twitter still shows that I collected 0 tweets this month even if I definitely collected several thousand, so I can't check it, that's probably related to the API change thing. But I don't think I actually collected a million yet, which I should normally be able to. – Jule Apr 12 '23 at 20:14

1 Answers1

0

I don't know why the API isn't working, but if you have academic access you can get all the tweets with something like this:

tweets_AfD <- tweet_search_all("from:AfD", 
                               since_id = "1211587346791063552",
                               until_id = "1321067464114032642", n = Inf)
tw <- lookup_tweets(tweets_AfD$id)

This requires the latest rtweet 1.2.0.9002 (I just pushed it to the repository)

llrs
  • 3,308
  • 35
  • 68
  • Thank you so much for taking the time to answer me. I did download the 1.2.0.9001 version of rtweet, and I can access the help documentation for the tweet_search_all function, yet when I try to run it it returns "Error in tweet_search_all("from:AfD", since_id = "1211587346791063552", : could not find function "tweet_search_all"". Do you happen to have any idea why that is? I am really sorry for bothering, as you can tell I'm still pretty much an rtweet noob. – Jule Apr 14 '23 at 09:24
  • Oh, sorry, I forgot to export some functions. You can use with rtweet:::tweet_search_all(.... But you still need the academic access or paid access (If you are new to the API I doubt you have paid for this). Simply said it is not a good time to use the Twitter API, the new plan make it harder to retrieve the data and develop tools to access it. – llrs Apr 14 '23 at 10:15
  • You're right, apparently the issue seems to be with the access tier, as the code ran now but I am getting a result that I do not have the correct level of access. I'll have to see if I can find another way around that (and yes, I'm very aware of the poor timing of this unfortunately). Again thanks! – Jule Apr 14 '23 at 15:21