1

I'm struggling with the getForm and the problem of redirecting my query. I've tried to experiment with cookiefile and followlocation as in other topics in Stackoverflow but with no result.

My code:

  getForm("http://korpus.pl/poliqarp/poliqarp.php",
          query = "pies", corpus = "2", showMatch = "1",showContext = "3",
          leftContext = "5", rightContext = "5", wideContext = "50", hitsPerPage = "10",              
          .opts = curlOptions(
            verbose = TRUE,
            followlocation=TRUE
            )
      )

Am I right that I'm getting the the content of the redirection page? If so how can I bypass it?

KkK
  • 65
  • 4

1 Answers1

0
curl = getCurlHandle(cookiefile = "", verbose = TRUE, followlocation=TRUE)

getForm("http://korpus.pl/poliqarp/poliqarp.php",
        query = "pies", corpus = "2", showMatch = "1",showContext = "3",
        leftContext = "5", rightContext = "5", wideContext = "50", hitsPerPage = "10",              
        .opts = curlOptions(
          verbose = TRUE,
          followlocation=TRUE
        )
        , curl = curl)


test1 <- getURL("http://korpus.pl/poliqarp/poliqarp.php", curl = curl)
test2 <- getURL("http://korpus.pl/poliqarp/poliqarp.php", curl = curl)

With a bit of persuasion test2 hopefully should contain the results

curl is a handle that will persist across calls. setting cookiefile tells RCurl to store the cookies. You can access the info in the curl handle using getCurlInfo(curl). For example

> cat(getCurlInfo(curl)$cookielist)
korpus.pl   FALSE   /   FALSE   0   PHPSESSID   ark8hbi13e2c4qrp51aq51nj62

The getForm call sets the important cookie PHPSESSID. The first getURL results in:

> library(XML)
> htmlParse(test1)['//h3'][[1]]
<h3>This page will <a href="poliqarp.php">refresh</a> automatically in a second</h3> 

It tells you it will auto refresh probably with javascript so you need to do this refresh manually by issuing another call.

jdharrison
  • 30,085
  • 4
  • 77
  • 89
  • Thank you very much for your answer! It works! Just to clarify Can you tell what exactly is done by getCurlHandle? I suppose that we are creating there a "ghost" cookie which is than followed by getURL. And we we need to call getURL twice? – KkK Apr 13 '14 at 11:34