The Twitter API documentation notes:
The process of looking for a next_token and including it in a subsequent request can be repeated until all (or some number of) Tweets are collected, or until a specified number of requests have been made. If data fidelity (collecting all matches of your query) is key to your use case, a simple "repeat until request.next_token is null" design will suffice.
What they're trying to communicate with the last sentence is that you need to implement some sort of code that repeats itself until either (a) the token is not in the response (indicating you collected all matches) or (b) you have enough data.
In R
, we call this type of "repeat until" a while
loop or repeat
loop. Here's the structure of your desired loop:
- Make a request.
- Store the results in some object/file.
- Extract the
next_token
from "meta" attribute of the JSON response.
- Construct a new query with the
next_token
string.
- Repeat steps 1-4 until either no
next_token
is in the response OR you've gone through as many pages as you desire.
Since (a) it sounds like you know how to make a call and get the token but that you are worried about how long it would take to manually doing all of this and (b) I cannot access your data, I'm gong to focus on a stylized solution.
In the below, I introduce a function auto_paginate()
. If you replace the placeholder functions I've inserted with code to accomplish the specified tasks, it will automatically paginate. The loop is embedded within the function and it's well-annotated where the loop begins/ends, how it exits, etc.
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Placeholder functions
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Function that takes a query as input and outputs a results object
# can be replaced with e.g. httr::GET() but need to add additional
# arguments to auto_paginate function to make sure you can pass
# everything you need to GET().
F_GET <- function(query) {
result <- query # replace this with function that calls API
x
}
# Function that takes the result of F_GET (or replacement function) and
# extracts the next_token value if the result contains a token. Must return
# NULL if no token is found.
F_find_next_token <- function(result) {
token <- result # replace this ensuring NULL if none found
return(token)
}
###############################################################################
# FUNCTION: auto_paginate
# Given a query and page limit, returns a list where each element of the
# list is the result for a unique page. Either returns all pages or returns
# no more than the page limit of pages.
#
# Arguments:
# - page_limit: numeric value for maximum number of pages to query
# - query: string containing an initial API call
###############################################################################
auto_paginate <- function(page_limit = NULL, query) {
# Initialize objects needed for the loop or that do not need to be repeated
null_limit <- is.null(page_limit)
result_list <- list()
page_counter <- 1
# Begin loop: everything within the brackets repeats until exit condition met
repeat {
# Request and save result
result <- F_GET(query)
result_list[[page_counter]] <- result
# Increment the page counter
page_counter <- page_counter + 1
# CONDITIONALLY EXIT LOOP: if desired page limit has been met
# note: nested IF statements because the first being true is required for
# the second test
if (null_limit == FALSE) {
if (page_counter > page_limit) {
break
}
}
# Look for next_token
next_token <- F_find_next_token(result)
# CONDITIONALLY EXIT LOOP: if there is no next_token (no more pages)
if (is.null(next_token)) {
break
}
# Create next query by:
# (1) removing the next_token if previously added (uses regular expression)
# (2) adding a next_token
query <- gsub("&next_token=.*", "", query)
query <- paste0(query, "&next_token=", next_token)
}
# Loop ended, return results
return(result_list)
}