Im trying to extract the wikipedia revision history of several hundred pages. However, the Mediawiki API sets the return limit to 500 for any given page(https://www.mediawiki.org/wiki/API:Revisions).
The "rvcontinue" parameter allows you to extract the next 500 and so on, but I'm not sure how to automate this in R. (I've seen some examples of Python code (Why does the Wikipedia API Call in Python throw up a Type Error?), but I don't know how to replicate it in R).
A sample GET request code for one page is appended below, any help is appreciated!
base_url <- "http://en.wikipedia.org/w/api.php"
query_param <- list(action = "query",
pageids = "8091",
format = "json",
prop = "revisions",
rvprop = "timestamp|ids|user|userid|size",
rvlimit = "max",
rvstart = "2014-05-01T12:00:00Z",
rvend = "2021-12-30T23:59:00Z",
rvdir = "newer",
rvcontinue = #the continue value returned from the original request goes here
)
revision_hist <- GET(base_url, query_param)
Ideally my GET request would automatically update the rvcontinue parameter every 500 values until there are none left.
Thanks!