5

I am trying to use the getURL function of RCurl Package in order to access an ASP Webpage as:

my_url <- "http://www.my_site.org/my_site/main.asp?ID=11&REFID=33"
webpage <- getURL(my_url)

but I get an Object Moved redirection error message like:

    "<head><title>Object moved</title></head>\n<body><h1>Object Moved</h1>
This object may be found <a HREF=\"/my_site/index.asp\">here</a>.</body>\n"

I followed various suggestions like using the curlEscape URL encoding function or by setting the CURLOPT_FOLLOWLOCATION and CCURLOPT_SSL_VERIFYHOST Parameters via the curlSetOpt Function as listed in the php ssl curl : object moved error link, but the later 2 were not recognized as valid RCurl options.

Any suggestions how to overcome the issue?

Community
  • 1
  • 1
user274136
  • 53
  • 1
  • 3
  • Are you behind a proxy? If so, you need to declare it. See, e.g. http://rosettacode.org/wiki/HTTP#R – Richie Cotton Sep 01 '10 at 10:46
  • Richie, I tried both variants with proxy set and without (using also the commandline curl.exe) and the exact same problem persists. – user274136 Sep 01 '10 at 11:30

1 Answers1

6

Use the followlocation curl option:

getURL(u,.opts=curlOptions(followlocation=TRUE))

with added cookiefile goodness - its supposed to be a file that doesnt exist, but I'm not sure how you can be sure of that:

w=getURL(u,.opts=curlOptions(followlocation=TRUE,cookiefile="nosuchfile"))
Spacedman
  • 92,590
  • 12
  • 140
  • 224
  • Thank you for your prompt answer. I used the followlocation option, but I appears that getURL isn't executed at all. I left it's execution for almost over 5 min and still there hasn't been any kind of progress. Maybe some other curloption is needed? – user274136 Sep 01 '10 at 10:24
  • What happens if you getURL the URL given in the redirect? ie getURL("http://www.my_site.org/my_site/index.asp")? I assume your PC can connect to that web site okay. Try getting the command-line Curl client and seeing if that connects, then add the complication of doing it in R. – Spacedman Sep 01 '10 at 10:30
  • @Spacedman, the command-line curl execution of the redirection leads after two more URL redirection outlines (with the appropriate "Object Moved" redirection message) to a point where the ASP REFID calling parameter is not recognized as an internal or external command – user274136 Sep 01 '10 at 11:08
  • Sounds impossible to get any further with this without access to the server. – Spacedman Sep 01 '10 at 14:12
  • @Spacedman, following again your advice and running curl on commandline basis I discovered via the verbose option that the "&" had to be encoded as it was wrongly interpreted in DOS and the GET command input showed that REFID wasn't taken into account. Nevertheless the curl execution terminates with "Object moved" message plus an message that "Connection #0 to host www.my_site.org left intact" and "Closing connection #0" – user274136 Sep 01 '10 at 14:44
  • Is there no way you can give us access to that server? Or can you find a public server that exhibits the same problem? – Spacedman Sep 01 '10 at 14:50
  • An according URL could be like http://www.ergose.gr/ergosesite/inner.asp?CONTAINERID=3&REFCI=109&LANGUAGE=1 – user274136 Sep 01 '10 at 14:58
  • It seems to go into a mad loop over three redirects. The first redirect is also sending back some cookie info, which I imagine getURL isn't pushing back to the server on the redirect. curl on the command line does this, so it's not an R problem any more. It works on a web browser because the browser probably sends back the cookie info. The docs for curl say to use the -b option, which works on the command line. Not sure how to get this into R though.. – Spacedman Sep 01 '10 at 16:42
  • Indeed the curl cookie option "-L -b empty.txt" delivers some information after the loop. Nevertheless the 2 included ASP parameters won't be taken into account. – user274136 Sep 03 '10 at 09:41