I am switching from using twitteR
to Rtweet
to automatically search Twitter for new tweets (twitteR
seems to truncate the text at 140 characters).The searchTwitter()
function in the twitteR package contained a sinceID
argument that allowed me to easily search for only the newest tweets and append them to my existing dataset. I cannot find a similar function in the search_tweets()
function of the Rtweet
package. Is there any way to avoid having to download the entire corpus of tweets and then delete duplicates, and instead only download the newest tweets?
Here is the function and an example of what I currently use:
library(tidyverse)
library(tidytext)
library(twitteR)
# FUNCTION ----------
searchtwitterlastweek_ft <- function(topic, sinceID){
today <- as.character(Sys.Date())
lastweek <- as.character(Sys.Date() - 6)
searchtwitterfortopic <- searchTwitteR(topic, n = 15000, since = lastweek, until = today, sinceID = sinceID)
if(!length(searchtwitterfortopic) == 0){
twListToDF(searchtwitterfortopic)
} else {
data.frame(list())
}
}
# LOAD DATASET FROM PREVIOUS WEEKS ---------
load("DATA/rstats.Rda")
df_r <- df_r %>%
arrange(desc(id))
# figure out last ID, i.e. last tweet on subject
lastid <- first(df_r$id)
df_temporary <- searchtwitterlastweek_ft("#rstats", lastid)
df_r <- rbind(df_r, df_temporary) %>%
arrange(desc(id))