7

I am having problems getting RCurl function getURL to access an HTTPS URL on a server that is using a self-signed certificate. I'm running R 3.0.2 on Mac OS X 10.9.2.

I have read the FAQ and the curl page on the subject. So this is where I stand:

  1. I have saved a copy of the certificate to disk (~/cert.pem).
  2. I have been able to use this very same file to connect to the server using python-requests and the 'verify' option, and succeeded.
  3. curl on the command-line seems to be ignoring the --cacert option. I succeeded in accessing the website with it after I flagged the certificate as trusted using the Mac OS X 'Keychain Access' app.
  4. RCurl stubbornly refuses to connect to the website with the following code:

    getURL("https://somesite.tld", verbose=T, cainfo=normalizePath("~/cert.pem"))

This is the output I get:

* Adding handle: conn: 0x7f92771b0400
* Adding handle: send: 0
* Adding handle: recv: 0
* Curl_addHandleToPipeline: length: 1
* - Conn 38 (0x7f92771b0400) send_pipe: 1, recv_pipe: 0
* About to connect() to somesite.tld port 443 (#38)
*   Trying 42.42.42.42...
* Connected to somesite.tld (42.42.42.42) port 443 (#38)
* SSL certificate problem: Invalid certificate chain
* Closing connection 38

When I tested both curl with the --cacert option and the RCurl code above in a Linux VM with the same cert.pem file and exact same URL, it worked perfectly.

So equal tests on Linux and Mac OS X, and only on Mac OS X do they fail. Even adding the certificate to the keychain didn't work.

The only thing that does work is using ssl.verifypeer=FALSE, but I don't want to do that for security reasons.

I'm out of ideas here. Anyone else have any suggestions on how to get this to work?

Thomas
  • 43,637
  • 12
  • 109
  • 140
asieira
  • 3,513
  • 3
  • 23
  • 23
  • Added a ticket to see if the author gives me some feedback: https://github.com/omegahat/RCurl/issues/6 – asieira Feb 27 '14 at 21:44
  • I've come to the conclusion that RCurl is hopeless. Reading the source code, for example, I realized that no effort whatsoever was made to close the handles with the underlying C-based lib curl when they were garbage collected, which leaves open connections and causes resource exhaustion if you use it for more than a few trivial downloads (https://github.com/omegahat/RCurl/issues/8). – asieira Mar 23 '14 at 20:01
  • I have instead written some Python code that uses [requests](http://docs.python-requests.org/en/latest/) and [grequests](https://github.com/kennethreitz/grequests). Then I wrote a small R wrapper to call that code externally. This yielded a huge performance gain, I was able to use the certificate correctly and I had no resource exhaustion problems. – asieira Mar 23 '14 at 20:03

2 Answers2

1

You can try:

library ("RCurl")
URL1 <- "https://data.mexbt.com/ticker/btcusd"
getURL(URL1,cainfo=system.file("CurlSSL","cacert.pem",package="RCurl"))
  • 1
    I haven't tested this since I am not using RCurl anymore, but the `cainfo` curl option is certainly the way to go. Nice catch. – asieira Mar 25 '16 at 13:58
1

Coming back to this issue I just wanted to point out that if you are still using RCurl, you should be using httr (which uses curl) instead.

I have confirmed that using config(cainfo="/path/to/certificate") with httr connections will work as intended.

asieira
  • 3,513
  • 3
  • 23
  • 23
  • Note that `config()` is part of package `httr`, hence `httr::config(cainfo="/path/to/certificate")` works even if you do not load the package `httr`. – SaschaH Jun 11 '19 at 08:45