I would like to download HTML webpages from www.geocaching.com in order to scrape some information. However the webpages I want to download have two ways of being displayed depending on whether the user is logged in. The information I want to scrape is only found when the user is logged in.
In the past I have used download.file()
and mapply()
to download HTML files from a list of URLs (geocache_link_list
) and name them using another list (geocache_name_list
) like this:
mapply(function(x,y) download.file(x,y), geocache_link_list, geocache_name_list)
but this downloads the non-logged in page.
I tried to use RCurl
also, but this also downloaded the non-logged in page and so I never attempted to incorporate it into a mapply function:
library(RCurl)
baseurl <- geocache_link_list[1]
un <- readline("Type the username:")
pw <- readline("Type the password:")
upw <- paste(un, pw, sep = ":")
Is there a way to run a browser from within R using something like RSelenium
or RCurl
in order to enter login details then redirect to the desired pages and download them?