Trouble downloading and accessing zip files from API in R

Question

I am trying to automate a process in R which involves downloading a zipped folder from an API* which contains a few .csv/.xml files, accessing its contents, and then extracting the .csv/.xml that I actually care about into a dataframe (or something else that is workable). However, I am having some problems accessing the contents of the API pull. From what I gather, the proper process for pulling from an API is to use GET() from the httr package to access the API's files, then the jsonlite package to process it. The second step in this process is failing me. The code I have been trying to use is roughly as follows:

library(httr)
library(jsonlite)

req <- "http://request.path.com/thisisanapi/SingleZip?option1=yes&option2=no"
res <- GET(url = req)
#this works as expected, with res$status_code == 200

#OPTION 1:
api_char <- rawToChar(res$content)
api_call <- fromJSON(api_char, flatten=T)

#OPTION 2:
api_char2 <- content(res, "text")
api_call2 <- fromJSON(api_char2, flatten=T)

In option 1, the first line fails with an "embedded nul in string" error. In option 2, the second line fails with a "lexical error: invalid char in json text" error.

I did some reading and found a few related threads. First, this person looks to be doing a very similar thing to me, but did not experience this error (this suggests that maybe the files are zipped/stored differently between the APIs that the two of us are using and that I have set up the GET() incorrectly?). Second, this person seems to be experiencing a similar problem with converting the raw data from the API. I attempted the fix from this thread, but it did not work. In option 1, the first line ran but the second line gave a similar "lexical error: invalid char in json text" as in option before and, in option 2, the second line gave a "if (is.character(txt) && length(txt) == 1 && nchar(txt, type = "bytes") < : missing value where TRUE/FALSE needed" error, which I am not quite sure how to interpret. This may be because the content_type differs between our API pulls: mine is application/x-zip-compressed and theirs is text/tab-separated-values; charset=utf-16le, so maybe removing the null characters is altogether inappropriate here.

There is some documentation on usage of the API I am using*, but a lot of it is a few years old now and seems to focus more on manual usage rather than integration with large automated downloads like I am working on (my end goal is a loop which executes the process described many times over slightly varying urls). I am most certainly a beginner to using APIs like this, and would really appreciate some insight!

* = specifically, I am pulling from CAISO's OASIS API. If you want to follow along with some real files, replace "http://request.path.com/thisisanapi/SingleZip?option1=yes&option2=no" with "http://oasis.caiso.com/oasisapi/SingleZip?resultformat=6&queryname=PRC_INTVL_LMP&version=3&startdatetime=20201225T09:00-0000&enddatetime=20201226T9:10-0000&market_run_id=RTM&grp_type=ALL"

Andrew, did my answer below help you? You haven't acknowledged or responded to it. — Francis Barton, Oct 15 '22 at 14:20

score 1 · Answer 1 · answered Aug 30 '22 at 09:02

I think the main issue here is that you don't have a JSON return from the API. You have a .zip file being returned, as binary (I think?) data. Your challenge is to process that data. I don't think fromJSON() will help you, as the data from the API isn't in JSON format.

Here's how I would do it. I prefer to use the httr2 package. The process below makes it nice and clear what the parameters of the query are.

library(httr2)

req <- httr2::request("http://oasis.caiso.com/oasisapi")

query <- req %>% 
  httr2::req_url_path_append("SingleZip") %>% 
  httr2::req_url_query(resultformat = 6) %>% 
  httr2::req_url_query(queryname = "PRC_INTVL_LMP") %>% 
  httr2::req_url_query(version = 3) %>% 
  httr2::req_url_query(startdatetime = "20201225T09:00-0000") %>% 
  httr2::req_url_query(enddatetime = "20201226T9:10-0000") %>% 
  httr2::req_url_query(market_run_id = "RTM") %>% 
  httr2::req_url_query(grp_type = "ALL")

# Check what our query looks like
query
#> <httr2_request>
#> GET
#> http://oasis.caiso.com/oasisapi/SingleZip?resultformat=6&queryname=PRC_INTVL_LMP&version=3&startdatetime=20201225T09%3A00-0000&enddatetime=20201226T9%3A10-0000&market_run_id=RTM&grp_type=ALL
#> Body: empty

resp <- query %>% 
  httr2::req_perform()

# Check what content type and encoding we have
# All looks good
resp %>% 
  httr2::resp_content_type()
#> [1] "application/x-zip-compressed"
resp %>% 
  httr2::resp_encoding()
#> [1] "UTF-8"

^{Created on 2022-08-30 with reprex v2.0.2}

Then you have a choice what to do, if you want to write the data to a zip file.

I discovered that the brio package will write raw data to a file nicely. Or you can just use download.file to download the .zip from the URL (you can just do that without all the httr stuff above). You need to use mode = "wb".

resp %>% 
  httr2::resp_body_raw() %>% 
  brio::write_file_raw(path = "out.zip")

# alternative using your original URL or query$url

download.file(query$url, "out.zip", mode = "wb")

R Avalos · Answer 2 · 2023-04-17T02:16:22.593

Oasis returns a zipped file. One method is to unzip the contents to a temporary file and then either read in that file to a dataframe or save to a local directory.

  # Unzip file to data frame, assumes CSV content requested
  temp_file <- tempfile() # create temporary file
  writeBin(content(resp), temp_file) # unzip CSV results to temporary file
  df <- read_csv(temp_file) # read in temporary csv file to dataframe
  file.remove(temp_file) # remove temporary file


  # Unzip file and save to folder ####
  temp_file <- tempfile() # create temp file
  writeBin(content(resp), temp_file) # write contents of response to temp file
  unzip(temp_file, exdir = save_to_folder) # unzip contents to folder
  file.remove(temp_file) # remove temporary file

Trouble downloading and accessing zip files from API in R

2 Answers2