1

About my project: I am using the academic twitter api and the package AcademicTwitteR to first scrape all tweets of amnesty international UK. This has worked fine. The next step is to use the conversation ids of those ~30,000 tweets to get the entire threads behind them, which is where my problem lies.

This is the code I am running:

`ai_t <-
  get_all_tweets(
    users = "AmnestyUK",
    start_tweets = "2008-01-01T00:00:00Z",
    end_tweets = "2022-11-14T00:00:00Z",
    bearer_token = BearerToken,
    n = Inf
  )`

`conversations <- c()`

`for (i in list){
 x<- get_all_tweets(
    start_tweets = "2008-01-01T00:00:00Z",
    end_tweets = "2022-11-14T00:00:00Z",
    bearer_token = BearerToken,
    n = Inf,
    conversation_id = c(i))
 conversations <- c(conversations, x)`

The problem I have is that this is an abundance of individual queries, but the package only allows to run one id at a time, putting in the list directly instead of the for loop produces an error, hence why I am using a loop. Apart from the rate limit sleep timer, individual queries already take anywhere between ~3 seconds, when not many tweets are retrieved, and more than that, when there are for example 2000 tweets with that conversation_id. A rough calculation already put this at multiple days of running this code, if I am not making a mistake.

The code itself seems to be working fine, I have tried with a short sample of the conversation ids:

`list2 <- list[c(1:3)]`

`for (i in list2){
 x<- get_all_tweets(
    start_tweets = "2008-01-01T00:00:00Z",
    end_tweets = "2022-11-14T00:00:00Z",
    bearer_token = BearerToken,
    n = Inf,
    conversation_id = c(i))
 conversations <- c(conversations, x)
`

Has anybody a solution for this or will is this the most efficient way and this will just take forever? I am unfortunately not experienced in python at all, but if there is an easier way in that language I would also be interested.

Cheers

0 Answers0