2

I am trying and failing to get the following process to complete in R Version 3.1.2:

library(RCurl)
library(XLConnect)
yr <- substr(Sys.Date(), 1, 4)
mo <- as.character(as.numeric(substr(Sys.Date(), 6, 7)) - 1)
temp <- tempfile()
temp <- getForm("http://strikemap.clb.org.hk/strikes/api.v4/export",
  FromYear = "2011", FromMonth = "1", 
  ToYear = yr, ToMonth = mo,
  `_lang` = "en")
CLB <- readWorksheetFromFile(temp, sheet=1)
unlink(temp)

I have been able manually to export the requested data set and then read it into R from a local directory using the same readWorksheetFromFile syntax. My goal now is to do the whole thing in R. The call to the API seems to work (thanks to some earlier help), but the process fails at the next step, when I try to ingest the results. Here's what happens:

> CLB <- readWorksheetFromFile(temp, sheet=1)
Error in path.expand(filename) : invalid 'path' argument

Any thoughts on what I'm doing wrong or what's broken?

Community
  • 1
  • 1
ulfelder
  • 5,305
  • 1
  • 22
  • 40
  • I did, and I get essentially the same error: `Error in file.exists(path) : invalid 'file' argument`. – ulfelder Jul 05 '15 at 10:27
  • I think you've mistakenly assumed that the result of `getForm()` is a file. – hadley Jul 05 '15 at 13:01
  • Ahhh, okay. What is it, HTML? If I run `str()` on the temp file produced by my `getForm()` call, I see that it's an atomic vector composed of unrecognizable elements. So how do I translate that vector in the table I'm after? Or is there a better way to get there? – ulfelder Jul 06 '15 at 10:59
  • You need to save the results of `getForm()` to a file... – hadley Jul 06 '15 at 15:14
  • Thank you! I've (almost) gotten this to work now. I will post the details in an answer and start a new question for the remaining glitch if necessary. – ulfelder Jul 06 '15 at 17:19

1 Answers1

0

Turns out the problem didn't lie with XLConnect at all. Based on Hadley's tip that I needed to save the results of my query to the API to a file before reading them back into R, I have managed (almost) to complete the process using the following code:

library(httr)
library(readxl)
yr <- substr(Sys.Date(), 1, 4)
mo <- as.character(as.numeric(substr(Sys.Date(), 6, 7)) - 1)
baseURL <- paste0("http://strikemap.clb.org.hk/strikes/api.v4/export?FromYear=2011&FromMonth=1&ToYear=", yr, "&ToMonth=", mo, "&_lang=en")
queryList <- parse_url(baseURL)
clb <- GET(build_url(queryList), write_disk("clb.temp.xlsx", overwrite=TRUE))
CLB <- read_excel("clb.temp.xlsx")

The object that creates, CLB, includes the desired data with one glitch: the dates in the first column are not being read properly. If I open "clb.temp.xlsx" in Excel, they show up as expected (e.g., 2015-06-30, or 6/30/2015 if I click on the cell). But read_excel() is reading them as numbers that don't track to those dates in an obvious way (e.g., 42185 for 2015-06-30). I tried fixing that by specifying that they were dates in the call to read_excel, but that produced a long string of warnings about expecting dates but getting those numbers.

If I use readWorkSheetFromFile() instead of read_excel at that last step, here's what happens:

> CLB <- readWorksheetFromFile("clb.temp.xlsx")
Error in (function (classes, fdef, mtable)  : unable to find an inherited method for function ‘readWorksheet’ for signature ‘"workbook", "missing"’

I will search for a solution to the problem using read_excel and will create a new question if I can't find one.

ulfelder
  • 5,305
  • 1
  • 22
  • 40
  • I was able to convert the numbers to the proper dates in R using this line of code: `CLB$Date <- as.Date(CLB$Date, format = "%Y-%m-%d", origin = "1899-12-30")`. I have no idea why that origin date works and had to figure it out through trial and error, but there it is. – ulfelder Jul 06 '15 at 18:03