4

I wish to use read.csv to read a google doc spreadsheet.

I try using the following code:

data_url <- "http://spreadsheets0.google.com/spreadsheet/pub?hl=en&hl=en&key=0AgMhDTVek_sDdGI2YzY2R1ZESDlmZS1VYUxvblQ0REE&single=true&gid=0&output=csv"
read.csv(data_url)

Which results in the following error:

Error in file(file, "rt") : cannot open the connection

I'm on windows 7. And the code was tried on R 2.12 and 2.13

I remember trying this a few months ago and it worked fine. Any suggestion what might be causing this or how to solve it?

Thanks.

Gavin Simpson
  • 170,508
  • 25
  • 396
  • 453
Tal Galili
  • 24,605
  • 44
  • 129
  • 187
  • 3
    I do not know if this will make you happy or sad. Your code works perfectly on my installation on Windows 7 and R 2.13 – Farrel Apr 30 '11 at 05:57

3 Answers3

9

It might have something to do with the fact that Google is reporting a 302 temporarily moved response.

> download.file(data_url, "~/foo.csv", method = "wget")
--2011-04-29 18:01:01--  http://spreadsheets0.google.com/spreadsheet/pub?hl=en&hl=en&key=0AgMhDTVek_sDdGI2YzY2R1ZESDlmZS1VYUxvblQ0REE&single=true&gid=0&output=csv
Resolving spreadsheets0.google.com... 74.125.230.132, 74.125.230.128, 74.125.230.130, ...
Connecting to spreadsheets0.google.com|74.125.230.132|:80... connected.
HTTP request sent, awaiting response... 302 Moved Temporarily
Location: https://spreadsheets0.google.com/spreadsheet/pub?hl=en&hl=en&key=0AgMhDTVek_sDdGI2YzY2R1ZESDlmZS1VYUxvblQ0REE&single=true&gid=0&output=csv [following]
--2011-04-29 18:01:01--  https://spreadsheets0.google.com/spreadsheet/pub?hl=en&hl=en&key=0AgMhDTVek_sDdGI2YzY2R1ZESDlmZS1VYUxvblQ0REE&single=true&gid=0&output=csv
Connecting to spreadsheets0.google.com|74.125.230.132|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/plain]
Saving to: `/home/gavin/foo.csv'

    [ <=>                                                                                                                                                   ] 41          --.-K/s   in 0s      

2011-04-29 18:01:02 (1.29 MB/s) - `/home/gavin/foo.csv' saved [41]

> read.csv("~/foo.csv")
  column1 column2
1       a       1
2       b       2
3      ds       3
4       d       4
5       f       5
6      ga       5

I'm not sure R's internal download code is capable of responding to such redirects:

> download.file(data_url, "~/foo.csv")
trying URL 'http://spreadsheets0.google.com/spreadsheet/pub?hl=en&hl=en&key=0AgMhDTVek_sDdGI2YzY2R1ZESDlmZS1VYUxvblQ0REE&single=true&gid=0&output=csv'
Error in download.file(data_url, "~/foo.csv") : 
  cannot open URL 'http://spreadsheets0.google.com/spreadsheet/pub?hl=en&hl=en&key=0AgMhDTVek_sDdGI2YzY2R1ZESDlmZS1VYUxvblQ0REE&single=true&gid=0&output=csv'
Gavin Simpson
  • 170,508
  • 25
  • 396
  • 453
3

I ran into the same problem and eventually found a solution in a forum thread. Using my own public CSV file:

library(RCurl)
tt = getForm("https://spreadsheets.google.com/spreadsheet/pub", 
          hl ="en_US", key = "0Aonsf4v9iDjGdHRaWWRFbXdQN1ZvbGx0LWVCeVd0T1E", 
          output = "csv", 
         .opts = list(followlocation = TRUE, verbose = TRUE, ssl.verifypeer = FALSE)) 

holidays <- read.csv(textConnection(tt))
John Horton
  • 4,122
  • 6
  • 31
  • 45
1

Check the solution on http://blog.forret.com/2011/07/google-docs-infamous-moved-temporarily-error-fixed/

So what is the solution: just add “&ndplr=1” to your URL and you will skip the authentication redirect. I’m not sure what the NDPLR parameter name stands for, let’s just call it: “Never Do Published Link Redirection“.

pforret
  • 67
  • 2
  • Hi there. Thanks for the idea - but it doesn't seem to fix it for my case. – Tal Galili Jul 16 '11 at 07:40
  • I see. In your case, like Gavin mentioned above, it redirects to the same URL but with https:// in front of it. From experience, Google now does all exports over https, so it would be safe to always substitute http for https. And, in case Google would ask for authentication, also add &ndplr=1 :-) – pforret Jul 16 '11 at 11:43