I am trying to scrape tweets of German politicians and parties. I used get_timeline() to loop through a vector of handles, like this:
for (user in afd_functional_users) {
# get user timeline tweets
user_tweets <- get_timeline(user, n = Inf, parse = TRUE, include_rts = FALSE,
exclude_replies = TRUE, trim_user = TRUE,
since_id = "1211587346791063552",
max_id = "1609503765043855360",
retryonratelimit = FALSE,
verbose = TRUE)
# add the user handle as a column to the dataframe
user_tweets$handle <- user
# append the user's tweets to the main dataframe
afd_df <- rbind(afd_df, user_tweets)
# pause briefly to avoid rate limit errors
Sys.sleep(5)
}
(the tweet IDs are tweets from my personal account that were posted on the days that I was interested in as a start/end date, which is 1 January 2020 to 31 December 2022).
This worked decently well for most users, however for some of them it did not actually scrape all the tweets in the timeframe, but cut off at a random point that was different for every user concerned. The number of tweets actually is also different for each user, for example for the user @AfD it collects 1003 tweets and for the user @AfDimBundestag it returns 2718 tweets. However, in both cases it cuts off before the actual date of 1 January 2020, in these cases in the middle of 2020/2021 respectively (starting from the newest tweets). I have had this problem with about 10-20% of all accounts I have collected tweets from, for the rest everything is fine.
Obviously the main solution that I tried was to use get_timeline() again for the concerned users, adjusting the timeframe to actually end with the ID of the last scraped tweet instead of the generic end date tweet. I tried this both in a loop format and by using singular users. Here is an example for the @AfD account:
user_tweets <- get_timeline("AfD", n = Inf, parse = TRUE, include_rts = FALSE,
exclude_replies = TRUE, trim_user = TRUE,
since_id = "1211587346791063552",
max_id = "1321067464114032642",
retryonratelimit = TRUE,
verbose = TRUE)
The code runs fine, but checking user_tweets returns 0 observations. I know for a fact that the accounts have tweeted in the timeframe, and have checked that manually via the Twitter search.
I have also tried using a version of search_tweets() and search_tweets2() to get the tweets of the concerned users, but this has not worked either.
Does anyone has any solution to this problem? I know unfortunately some things are not working anymore due to the old Twitter API being shut down, but I hope maybe something can be found. Let me know if you need more info.