0

I'm trying to adopt the Reproducible Research paradigm but meet people who like looking at Excel rather than text data files half way, by using Dropbox to host Excel files which I can then access using the .xlsx package.

Rather like downloading and unpacking a zipped file I assumed something like the following would work:

# Prerequisites
require("xlsx")
require("ggplot2")
require("repmis")
require("devtools")
require("RCurl") 
# Downloading data from Dropbox location


link <- paste0(
    "https://www.dropbox.com/s/",
    "{THE SHA-1 KEY}",
    "{THE FILE NAME}"
)

url <- getURL(link)

temp <- tempfile()
download.file(url, temp)

However, I get Error in download.file(url, temp) : unsupported URL scheme

Is there an alternative to download.file that will accept this URL scheme? Thanks, Jon

JonMinton
  • 1,239
  • 2
  • 8
  • 26
  • I'm not familiar with how DropBox works on this particular kind of an issue; is the problem that you aren't authenticating, or that for some reason, RCurl is barfing on your URL? Is the link you would use publicly available? – TARehman May 08 '14 at 22:49
  • Also...have you checked to see if `getURL()` is actually working in this case, or if you are running into a certificate issue? By default, curl won't know where to look for a certificate, and so will reject. You can either turn off CA verification, or point it at a certificate. – TARehman May 08 '14 at 22:55
  • similar recent question: http://stackoverflow.com/questions/23531897/read-xls-gdata-from-an-https-url. – GSee May 09 '14 at 02:47
  • 1
    try something other than the default for the method argument to download.file. You could also set the method with e.g. options(download.file.method="curl") to effectively change the default value – GSee May 09 '14 at 03:13

2 Answers2

0

You have the wrong URL - the one you are using just goes to the landing page. I think the actual download URL is different, I managed to get it sort of working using the below.

I actually don't think you need to use RCurl or the getURL() function, and I think you were leaving out some relatively important /'s in your previous formulation.

Try the following:

link <- paste("https://dl.dropboxusercontent.com/s",
              "{THE SHA-1 KEY}",
              "{THE FILE NAME}",
               sep="/")

download.file(url=link,destfile="your.destination.xlsx")
closeAllConnections()
TARehman
  • 6,659
  • 3
  • 33
  • 60
  • This probably won't work on non-Windows platforms without `method=curl` or `method=wget` because `?download.file` says "Note that https:// URLs are not supported by the internal method." – GSee May 09 '14 at 02:45
  • Hi.Thanks for this. The SHA & File name parts were just for illustration to avoid broadcasting the SHA-1 key. The '/'s were in place. The problem's exactly as before: 'unsupported URL scheme'. I'm using an OSX so it might work on Windows. I'll find out in a week when I next get access to one. – JonMinton May 09 '14 at 18:56
0

UPDATE:

I just realised there is a source_XlsxData function in the repmis package, which in theory should do the job perfectly.

Also the function below works some of the time but not others, and appears to get stuck at the GET line. So, a better solution would be very welcome.


I decided to try taking a step back and figure out how to download a raw file from a secure (https) url. I adapted (butchered?) the source_url function in devtools to produce the following:

download_file_url <- function (

   url, 
   outfile,
   ..., sha1 = NULL) 
{
    require(RCurl)
    require(devtools)
    require(repmis)
    require(httr)
    require(digest)
    
    stopifnot(is.character(url), length(url) == 1)
    filetag <- file(outfile, "wb")
    request <- GET(url)
    stop_for_status(request)
    writeBin(content(request, type = "raw"), filetag)
    close(filetag)
}

This seems to work for producing local versions of binary files - Excel included. Nicer, neater, smarter improvements in this gratefully received.

Community
  • 1
  • 1
JonMinton
  • 1,239
  • 2
  • 8
  • 26