1

I am attempting to create a function that can iterate over a specified time span (e.g., last 30 days or last 90 days). I'm limited to 2,500 records per pull, so I may need to perform a pull for 1 day at a time, or 1 week at a time depending on my parameters.

I have looked at API Query for loop for here, and can't quite get it to do what I want. I have created a while() function that produces a vector of URLs:

end_date   <- Sys.Date()
start_date <- as.Date("2020-01-27", format = "%Y-%m-%d")

the_date <- start_date

while(the_date <= end_date)
{
  api <-  paste0("https://website.com/api/?action=search_pcrs&e1=eTimes.03&o1=between&v1=",
               the_date,
               "^", 
               end_date,
               "&e2=eMedications.03&o2=in&v2type=id&v2=14731^14730^14729^3864&col...")
  the_date <- the_date + 1
  as.character(api)
  print(api)
  }

[1] "https://website.com/api/?action=search_pcrs&e1=eTimes.03&o1=between&v1=2020-01-27^2020-01-30&e2=eMedications.03&o2=in&v2type=id&v2=14731^14730^14729^3864&col..."
[1] "https://website.com/api/?action=search_pcrs&e1=eTimes.03&o1=between&v1=2020-01-28^2020-01-30&e2=eMedications.03&o2=in&v2type=id&v2=14731^14730^14729^3864&col...
[1] "https://website.com/api/?action=search_pcrs&e1=eTimes.03&o1=between&v1=2020-01-29^2020-01-30&e2=eMedications.03&o2=in&v2type=id&v2=14731^14730^14729^3864&col..."
[1] "https://website.com/api/?action=search_pcrs&e1=eTimes.03&o1=between&v1=2020-01-30^2020-01-30&e2=eMedications.03&o2=in&v2type=id&v2=14731^14730^14729^3864&col..."

Here is where I get stuck. I would like to create a function that iterates over each URL, and then combines the data.

When I perform a single pull, I use the following:

api_get  <- GET(url)
api_raw  <- rawToChar(api_get$content)
api_tree <- xmlTreeParse(api_raw, useInternalNodes = T)
api_df   <- xmlToDataFrame(api_tree, nodes = getNodeSet(api_tree, "//pcr")

Creating 30 of these is certainly not the most efficient way... hoping to get some help on this.

Marshall
  • 85
  • 7

1 Answers1

1

This script should work, assuming your statements and parsing of the api/webpage is correct.
See comments for details:

end_date   <- Sys.Date()
start_date <- as.Date("2020-01-27", format = "%Y-%m-%d") 
the_date <- start_date

#create an empty list
output<-list()

while(the_date <= end_date)
{
  #Track which date is being pulled - handy for debugging when script errors
  print(the_date)
  url <-  paste0("https://website.com/api/?action=search_pcrs&e1=eTimes.03&o1=between&v1=",
                 the_date,
                 "^", 
                 end_date,
                 "&e2=eMedications.03&o2=in&v2type=id&v2=14731^14730^14729^3864&col...")

  api_get  <- GET(url)
  api_raw  <- rawToChar(api_get$content)
  api_tree <- xmlTreeParse(api_raw, useInternalNodes = T)

  #Append dataframe to list - item named by date
  output[[as.character(the_date)]]<-xmlToDataFrame(api_tree, nodes = getNodeSet(api_tree, "//pcr"))
  #slight system pause to prevent attacking the server
  Sys.sleep(0.7)

  the_date <- the_date + 1
}

#combine all of the dataframes in the output list into one large data frame
alloutput<-do.call(rbind, output)
Dave2e
  • 22,192
  • 18
  • 42
  • 50
  • Thanks @Dave2e! When I run the script, I'm getting the following error message: `Error in curl::curl_fetch_memory(url, handle = handle) : Send failure: Connection was reset` – Marshall Jan 30 '20 at 21:53
  • On `traceback()` I got the following (not sure if this helps?): `> traceback() 5: curl::curl_fetch_memory(url, handle = handle) 4: request_fetch.write_memory(req$output, req$url, handle) 3: request_fetch(req$output, req$url, handle) 2: request_perform(req, hu$handle$handle) 1: GET(url) > ` I'm really at a loss. – Marshall Jan 30 '20 at 22:23
  • @Marshall, I didn't verify that the URLs are correct. Are you sure you have the correct URL to ping? website.com seems to be a web hosting service. – Dave2e Jan 31 '20 at 00:48
  • I did. I chose a generic term for the website due to sensitivity of where the data Is coming from. I’ll re-examine to make sure my url is correct. If it is, do you have any thoughts on why I could be getting these errors? – Marshall Jan 31 '20 at 01:01
  • I don't have a ready answer for you, A couple of possibilities, An incorrect URL, a firewall/proxy server is preventing you from reaching the site, Incorrect user credentials in the URL. This is getting beyond my experience. – Dave2e Jan 31 '20 at 01:05
  • Gotcha... I’ll give it a go tomorrow and hopefully get it to work. Thank you for your help on this; can’t tell you how much I appreciate it. – Marshall Jan 31 '20 at 01:09
  • I had small typo in my script. Was able to get it to work! – Marshall Jan 31 '20 at 13:35