0

I would like to post data to a web form from R, and retrieve the result. Is this possible at all?

In particular, I would like to pass a text file to this website http://ionspectra.org/aristo/batchmode/ and retrieve the result.

The post method the website uses is

<form action="../batchreport/" method="post" enctype="multipart/form-data"><div style='display:none'><input type='hidden' name='csrfmiddlewaretoken' value='d9c49e206913e5d8b515bc9705ca2e09' />

First I would like to check the radio button "format" to Tab-delimited:

<input type="radio" name="format" value="tsv" /> Tab-delimited <br/>

Then I would like to upload a given file:

<input type="file" name="batchfile" size="20"><br/>

Then have the submit button clicked:

<input type="submit" value="Ontologize!" />

And finally have the resulting text file be retrieved.

Question is, can this be scripted from R, and if so, using what package? Can it be done using RCurl's postForm perhaps? But if so, what would be the syntax in this case?

Any advice welcome!

cheers, Tom

Tom Wenseleers
  • 7,535
  • 7
  • 63
  • 103

2 Answers2

3

This is a little trickier than normal since it's a Django website, and we need to deal with Django's Cross-site Request Forgery protection by generating a CSRF token.

Here's how to do it with httr, using the example file provided here:

library(httr)
csrf <- GET(url='http://ionspectra.org/aristo/batchmode/')$cookies$csrftoken
res <- POST(url='http://ionspectra.org/aristo/batchreport/', 
            body=list(batchfile=upload_file('example.txt'),
                      format='tsv',
                      csrfmiddlewaretoken=csrf))
out <- read.delim(file=textConnection(content(res)), 
                  stringsAsFactors=FALSE)

The GET call generates the CSRF token, which is needed for the subsequent POST call.

jbaums
  • 27,115
  • 5
  • 79
  • 119
1

This also works and does not require the GET request. Basically, the ionspectra website plants a cookie when you access the webform, and sends that cookie back to the server in a hidden variable when you SUBMIT. Then the server compares the two. The code below spoofs the cookie using set_cookies(csrftoken=...), to be the same as csrfmiddlewaretoken in the body of the POST. As you can see, the token can be just about anything.

library(httr)
# download example dataset and save as file "example.txt"
data <- readLines("http://ionspectra.org/static/aristo/example.txt")
file <- writeLines(data,"example.txt")
# POST request; z$content is the returned content, in raw format
z <- POST(url="http://ionspectra.org/aristo/batchreport/",
       set_cookies(csrftoken="arbitrarytoken"),
       body=list(csrfmiddlewaretoken="arbitrarytoken",
                 format="tsv",
                 filter="filter",
                 submit="Ontologize!",
                 batchfile=upload_file("example.txt")))

df <- read.csv(text=rawToChar(z$content),header=T,sep="\t")
head(df)
#   scan.       title score    ChEBI_ID         ChEBI_Name  N   AUC Est..Precision Correct.
# 1     8 CHEBI:34205 0.781 CHEBI:53156 polychlorobiphenyl 24 0.990              1     True
# 2     8 CHEBI:34205 0.755 CHEBI:35446     chlorobiphenyl 27 0.990              1     True
# 3     8 CHEBI:34205 0.708 CHEBI:22888          biphenyls 38 0.943              1     True
# 4     8 CHEBI:34205 0.698 CHEBI:36686        chloroarene 40 0.966              1     True
# 5     8 CHEBI:34205 0.694 CHEBI:36820      ring assembly 49 0.827              1     True
# 6     8 CHEBI:34205 0.681 CHEBI:50887          haloarene 44 0.955              1     True
jlhoward
  • 58,004
  • 7
  • 97
  • 140