0

I am switching from using twitteR to Rtweet to automatically search Twitter for new tweets (twitteR seems to truncate the text at 140 characters).The searchTwitter() function in the twitteR package contained a sinceID argument that allowed me to easily search for only the newest tweets and append them to my existing dataset. I cannot find a similar function in the search_tweets() function of the Rtweet package. Is there any way to avoid having to download the entire corpus of tweets and then delete duplicates, and instead only download the newest tweets?

Here is the function and an example of what I currently use:

library(tidyverse)
library(tidytext)
library(twitteR)

# FUNCTION ----------
searchtwitterlastweek_ft <- function(topic, sinceID){
  today <- as.character(Sys.Date())
  lastweek <- as.character(Sys.Date() - 6)
  searchtwitterfortopic <- searchTwitteR(topic, n = 15000, since = lastweek, until = today, sinceID = sinceID)
  if(!length(searchtwitterfortopic) == 0){
    twListToDF(searchtwitterfortopic)
  } else {
    data.frame(list())
  }
}

# LOAD DATASET FROM PREVIOUS WEEKS ---------
load("DATA/rstats.Rda")

df_r <- df_r %>%
  arrange(desc(id))

# figure out last ID, i.e. last tweet on subject
lastid <- first(df_r$id)

df_temporary <- searchtwitterlastweek_ft("#rstats", lastid)

df_r <- rbind(df_r, df_temporary) %>%
  arrange(desc(id))
B_alban
  • 1
  • 1

1 Answers1

0

look at the stream_tweet function of the rtweet pack that should help you =)

Jérémy
  • 340
  • 1
  • 3
  • 13