0

I have started using rtweet package and so far, I have had good results for my queries, languages and geocode parameters. However, I still do not know how can I collect twitter data from within the last 7 days.

For example in the next code chunk I want to extract some data for 7 days but I am not sure if the collected tweets will be since 2017-06-29 until 2017-06-05 or if they will be since 2017-06-22 until 2017-06-29:

Stream all tweets mentioning AMLO or lopezobrador for 7 days

stream_tweets("AMLO,lopezobrador",
          timeout = 60*60*24*7,
          file_name = "tweetsaboutAMLO.json",
          parse = FALSE)

Read in the data as a tidy tbl data frame

AMLO <- parse_stream("tweetsaboutAMLO.json")

Do you know if there are any commands in rtweet to specify the time frame to use when using the search_tweets() or stream_tweets() functions?

  • stream_tweets() will keep a conection alive to the Twitter's Stream API for the time you specify it from the moment you run it. In other words, it catches tweets from the present and into the future. It does not work to get tweets from the past, or at least not for tweets from more than a few minutes ago. Currently, there is no public way to search through keywords for tweets older than 7-9 days. You can, on the other hand, query the timeline of an specific user. – Nicolás Velasquez Jun 30 '18 at 15:41

2 Answers2

1

So, to answer your question about gow to write it more efficiently, you could try a for loop or a list apply. Here I show the for loop.

First, create a list with the 4 dates you are calling.

fechas <- seq.Date(from = as.Date("2018-06-24"), to = as.Date("2018-06-27"), by =  1)

Then create an empty data.frame to store your tweets.

df_tweets <- data.frame()

Now, loop along your list and populate the empty data.frame.

for (i in seq_along(fechas)) {
 df_temp <-  search_tweets("lang:es",
                        geocode = mexico_coord,
                        until= fechas[i],
                        n = 100)
 df_tweets <- rbind(df_tweets, df_temp)
}

summary(df_tweets)

On the other hand, the following solution might be more convenient and efficient altogether:

library(tidyverse)
f_tweets2 <- search_tweets("lang:es",
                         geocode = mexico_coord,
                         until= "2018-06-29", ## or latest date                            
                        n = 10000)
df_tweets2 %>% 
  group_by(as.Date(created_at)) %>%  ## Group (or set apart) the tweets by date of creation
  sample_n(100)   ## Obtain 100 random tweets for each group, in this case, for each date.
Nicolás Velasquez
  • 5,623
  • 11
  • 22
0

I already found a wat to collect tweets within the past seven days. However, it is not efficient.

rt_24 <- search_tweets("lang:es", 
                       geocode = mexico_coord, 
                       until="2018-06-24",
                       n = 100)

rt_25 <- search_tweets("lang:es",
                       geocode = mexico_coord,
                       until="2018-06-25",
                       n = 100)

rt_26 <- search_tweets("lang:es",
                       geocode = mexico_coord,
                       until="2018-06-26",
                       n = 100)

rt_27 <- search_tweets("lang:es",
                       geocode = mexico_coord,
                       until="2018-06-27",
                       n = 100)

Then, append the dataframes

rbind(rt_24,rt_25,rt_26,rt_27)

Do you know if there is a more efficient way to write this? Maybe using the max_id() function in combination with until ?

  • 1
    I have not found a significant difference in the results from search_tweets() queries ending in each of the past seven days than for queries on all of the past seven days. As per the efficiency, you certainly could write a for loop or a purrr:map() The max_id() won't necessarily help you if you do not know which screenName/user you are querying for. In other words, your tweet's ids could be higher than mine, even if we tweet on the exact same second. Yet, all of your future tweet's ids will be higher than any of your past tweet's ids. – Nicolás Velasquez Jun 30 '18 at 23:06
  • Thank you very much for your help Nicolás! – Juan Carlos Gonzalez Ibarguen Jul 01 '18 at 16:55